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A  PROJECTED  LAGRANGIAN  ALGORITHM  FOR  NONLINEAR  MINIMAX  OPTIMIZATION 

by 

Walter  Murray 
and 

Michael  L.  Overton* 

ABSTRACT 

The  minimax  problem  is  an  unconstrained  optimization  problem 
whose  objective  function  is  not  differentiable  everywhere,  and  hence 
cannot  be  solved  efficiently  by  standard  techniques  for  unconstrained 
optimization.  It  is  well  known  that  the  problem  can  be  transformed 
into  a  nonlinearly  constrained  optimization  problem  with  one  extra 
variable,  where  the  objective  and  constraint  functions  are  continuously 
differentiable.  This  equivalent  problem  has  special  properties  which 
are  ignored  if  solved  by  a  general-purpose  constrained  optimization 
method.  The  algorithm  we  present  exploits  the  special  structure  of  the 
equivalent  problem.  A  direction  of  search  is  obtained  at  each 
iteration  of  the  algorithm  by  solving  an  equality-constrained  quadratic 
programming  problem,  related  to  one  a  projected  Lagrangian  method  might 
use  to  solve  the  equivalent  constrained  optimization  problem.  Special 
Lagrange  multiplier  estimates  are  used  to  form  an  approximation  to  the 
Hessian  of  the  Lagrangian  function,  which  appears  in  the  quadratic  program. 
Analytical  Hessians,  finite-differencing  or  quasi-Newton  updating  may 
be  used  in  the  approximation  of  this  matrix.  The  resulting  direction  of 
search  is  guaranteed  to  be  a  descent  direction  for  the  minimax  objective 
function.  Under  mild  conditions  the  algorithms  are  locally  quadratically 


convergent  if  analytical  Hessians  are  used. 


1.  Introduction. 

The  problem  of  concern  is 

MMP:  min  {FM(x)  |  x  e  R11}  , 
x 

where  FM(x)  =  {f^Gc)*  i  =  1,2,. ..,m} 

n  X 

and  the  functions  f^:  ET-»  R  are  twice  continuously  differentiable. 

The  function  F„(x)  is  called  the  minimax  function  and  MMP  is  usually 
M  ' 

referred  to  as  the  minimax  problem.  The  minimax  problem  is  an  unconstrained 
optimization  problem  in  which  the  objective  function  has  discontinuous  deriva¬ 
tives.  Moreover,  any  solution  is  usually  at  a  point  of  discontinuity  and  con¬ 
sequently  it  is  inappropriate  to  use  any  of  the  known  powerful  methods  for 
unconstrained  minimization  to  solve  MMP.  An  equivalent  problem  to  MMP  is  the 
following  nonlinearly  constrained  problem  in  which  both  the  objective  and 
constraint  functions  are  twice  continuously  differentiable: 

EMP:  min  {xn+1  I  x  e  K0*1} 
x 

subject  to  c^(x)  >0,  i  =  1,2,..., m, 

where  c^x)  =  xq+1  -  f^Cx),  i  =  l,2,...,m, 
and  xT=  (x1 ,  *n+1)  • 

We  could  solve  EMP  using  one  of  the  many  methods  available  for  the  general 
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constrained  optimization  problem: 


NCP:  min  {.’^(x)} 
x 

(g) 

subject  to  c^' (x)  >  0,  i  =  l,2,...,m  , 

(g  )  (g  ) 

where  F'  and  {c^  }  are  arbitrary  twice  continuously  differentiable 

functions.  It  will  be  shown,  however,  that  a  method  can  be  derived  that 

exploits  the  special  structure  of  BMP. 

The  primary  special  feature  of  EMP  from  which  many  other  properties 

follow  is  that  the  minimax  function  F„  is  itself  a  natural  merit  function 

M  - - - ■ - - 

which  can  be  used  to  measure  progress  towards  the  solution  of  EMP.  For 
problem  NCP,  in  general  such  a  natural  merit  function  is  not  available,  and 
it  is  necessary  to  introduce  an  artificial  one  such  as  a  penalty  or  aug¬ 
mented  Lagrangian  function  to  weigh  the  constraint  violation  against  the 
decreasing  of  the  objective  function,  or  a  barrier  function  to  enforce 
feasibility.  All  these  merit  functions  require  the  definition  of  a  para¬ 
meter  which  is  to  some  degree  arbitrary  and  its  selection  can  prove  difficult. 
In  the  case  of  penalty  and  augmented  Lagrangian  functions,  difficulties  may 
also  arise  because  often  the  global  minimum  of  the  merit  function  is  not 
the  solution  of  the  original  problem. 

The  method  we  adopt  to  solve  MMP  essentially  consists  of  two  steps 
at  each  iteration: 

(l)  Obtain  a  direction  of  search  by  solving  and  perhaps  modifying 
an  equality-constrained  quadratic  programming  problem  (QP),  related  to  one 
a  projected  Lagrangian  algorithm  might  vise  to  solve  EMP.  This  procedure  is 

S 
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described  in  full  in  subsequent  sections. 

(2)  Take  a  step  along  the  search  direction  which  reduces  the 
minimax  function.  Because  the  minimax  function  is  not  differentiable, 
it  is  important  for  efficiency  to  use  a  special  line  search 
algorithm. 

Projected  Iagrangian  algorithms  for  solving  the  general  problem  NCP 
via  successive  quadratic  programs  have  been  proposed  or  analyzed  by  a 
number  of  authors  including  Wilson  (1963) ,  Murray  (1969a),  Robinson  (197U), 
Wright  (l97 6),  Han  (l977a),  Powell  (1977),  and  Murray  and  Wright  (l978). 

We  make  further  comments  on  the  extent  of  the  implications  of  the  special 
structure  of  EMP,  and  hence  the  relationship  of  our  algorithm  to  these 
algorithms  for  the  general  problem,  in  Section  15. 

A  number  of  other  algorithms  have  been  proposed  for  solving  the  non¬ 
linear  minimax  problem.  Our  approach  is  most  closely  related  to  those  due 
to  Han  (l977b)  and  Conn  (l979)»  We  will  discuss  these  further  in  Section  12, 
after  our  algorithm  has  been  described  in  full. 

An  important  special  case  of  MMP  is  the  problem  of  minimizing  the 

i  norm  of  a  vector  function  f(x)  e  R™  : 

00 

l  P:  min  fF  (x)  |  x  e  E11] 

00  -CO  J 

where  F  (x)  =  max  f|  f,  (x)  |,  i  =  1,2,..., m]  . 

00  X  J 

Handling  this  case  in  a  special  manner  presents  no  essential  difficulties. 
However,  in  order  to  avoid  unnecessarily  complicated  notation,  we  postpone 
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discussion  of  this  until  Section  11. 

We  note  that  no  convexity  assumptions  are  made  about  the  functions 
f^ (x)  .  The  difficulties  of  finding  global  minima  without  convexity 
assumptions  are  well  known  -  -  we  concern  ourselves  only  with  local 
minima. 


1.1  Notation. 

* 

*  _ 

Define  x  to  be  a  solution  of  EMP  .  It  follows  that  x  ,  the 

* 

vector  composed  of  the  first  n  elements  of  x  ,  is  a  solution  to  MMP 
* 

“d  W  fm(J)  * 

(v\  *  —(v) 

Let  x  denote  the  k-th  approximation  to  x  and  x  the  k-th 
* 

approximation  to  x  .  In  general,  we  will  use  a  —  placed  above  a  vector 
to  denote  the  vector  composed  of  the  first  n  elements  of  the  vector  without 


the  -  . 


At  each  iteration  of  the  algorithm,  x 


(k*l) 


is  obtained  by 


setting 


i(k>.  „  5  and  *  <*fl) 


where  p  is  the  direction  of  search  and  a  ,  a  positive  scalar,  is  the 

steplength.  Note  that  this  choice  of  xn+i^+1^  immediately  guarantees 

(k) 

that  all  the  points  fx  }  are  feasible  for  problem  EMP,  i.e. 

(k) 

'x  )  >0  ,  i  =  l,»».  ,m  • 

At  any  point  x  we  define  an  active  set  of  constraints  of  EMP  as  those 

* 

which  we  think  will  have  the  value  zero  at  the  solution  x  ,  based  on  the 
information  at  x  .  This  set  will  usually  include  all  constraints  with  the 
value  zero  at  the  point  x  and  may  also  include  seme  with  positive  values. 
The  exact  procedure  for  initially  selecting  the  active  set  at  each  iteration 
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will  be  discussed  in  Section  10,  and  procedures  for  modifying  this  choice 

will  be  described  in  Sections  5«2  and  6.  We  define  t(=  t(x))  to  be  the 

number  of  active  constraints  at  x  ,  and  write  the  vector  of  active  con- 

straints  as  c(x)  e  R  .  We  similarly  define  f(x)  as  the  vector  of 

active  functions  corresponding  to  the  active  constraints,  i.e.,  those 

*  * 

functions  expected  to  have  the  value  Ij|x)  at  x  .  Let  V(x)  be  the 
n  x  t  matrix  whose  columns  fv^ (x)}  are  the  gradients  of  the  active 

A  A 

functions,  and  let  A(x)  be  the  (n+l)x  t  matrix  whose  columns  fa  (x)} 
are  the  gradients  of  the  active  constraints.  Thus 


A(x)  = 


*  t  t  t 

where  e  «  e  R 


A 

=  fa1(x). 


•  at(x)] 


1 


We  define  Y(x)  to  be  a  matrix  with  orthonormal  columns  spanning  the  range 
space  of  A(x),  and  Z(x)  to  be  a  matrix  with  orthonormal  columns  spanning 

a 

the  null  space  of  A(x)  .  Let  I  be  the  identity  matrix  of  order  s  . 

s 

A 

Provided  A(x)  has  full  rank,  we  have  that  Y(x)  has  dimension  (n+l)  x  t  , 
Z(x)  has  dimension  (n+l)  X  (n+l-t)  ,  and 

Y(x)T  Y(x)  =  It,  Z(x)T  z(x)  =  In+lt  , 

Y(x)T  Z(x)  =  A(x)TZ(x)  =  0  . 

Let  en+1  =  (0,...,0,l)T  e  H11*1  .  The  Lagrangian  function  for 

problem  EMP  is  given  by 
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L(x,X)  -  Vl' 


T 

X 


c(x) 


where  \  e  R*'  is  a  vector  of  Lagrange  multipliers.  The  gradient  of 
L(x,\)  with  respect  to  x  is  en+^  -  AX.  We  define  the  (n+l)  x  (n+l) 
matrix  W(x,\ )  to  he  the  Hessian  of  the  Iagrangian  function  with  respect 
to  x  .  Thus 

t  (x) 

W(x,\)  -X±  V2ci(x) 

i=l 


where 


[w(x,x)  o 
0  °. 

t  (x) 

»G,X)  -£ 

i=l 


The  term 'projected  Hessian  of  the  Lagrangian  function "is  used  to  indicate 

a 

projection  into  the  null  space  of  A(x),  i. e. ,  the  matrix 

Z(x)T  W(x,x)z(x)  .  This  matrix  may  also  be  written  Z(x)^  W(x,^)  Z(x)  , 

where  Z(x)  consists  of  the  first  n  rows  of  Z(x)  . 

A  A 

Often  we  will  emit  the  arguments  from  c,  A  ,  Z,  etc.  when  it  is 

(v)  +  #*  # 

clear  that  they  are  evaluated  at  x  .We  use  the  notation  V,  A,  Z  , 

A  A  ^ 

etc.  to  denote  V,  A,  Z,  etc.  evaluated  at  x  with  the  active  set 

correctly  chosen,  i.e. ,  consisting  of  all  those  constraints  with  the  value 
* 

zero  at  x  . 
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1.2  Necessary  and  Sufficient  Conditions. 


In  the  following  we  shall  refer  to  the  first-  and  second-order 

constraint  qualifications  and  the  necessary  and  sufficient  conditions 
* 

for  a  point  x  to  be  a  local  minimum  of  the  general  problem  NCP  as 

defined  in  Fiacco  and  McCormick  (l968).  The  conditions  for  x  to  be  a 

* 

local  minimum  of  BMP  (and  hence  x  of  MMP)  are  simplifications  of 
these  general  conditions.  The  main  simplification  is  that  the  firBt- 
order  constraint  qualification  always  holds  for  EMP.  To  see  this 
observe  the  following.  For  any  point  x  let  p  be  any  nonzero  vector 


satisfying 


a^(x)Tp  >  0  for  all  i  s.t.  c^(x)  =  0 


where  the  vector  a^  is  the  gradient  of  c^  .  Then  p  is  tangent  at 
9=0  to  the  locally  differentiable  feasible  arc 


(0)  =  x  +  Qp 


max  fFM(i  +  ©p),  xn+1  +  0Pnfl) 


The  first-order  conditions  therefore  reduce  to  the  following  (see  Demyanov 
and  Malozemov  (1974)  for  an  alternative  derivation  applied  directly  to  MMP) 

First-order  necessary  condition. 


If  x  is  a  local  minimum  of  EMP  then  there  exists  a  vector  of 
Lagrange  multipliers  t  e  such  that 

Vi  -  «- 0 


and  X  >  0  . 


Two  conditions  which  are  equivalent  to  (l.l)  are  that  x  is  a 
stationary  point  of  L(x,x)  with  respect  to  x  and  that  Z e  0  . 

Note  that  (l.l)  implies  that  V  is  rank  deficient  and  that  the  sum  of 
the  multipliers  is  one. 

Hie  second-order  constraint  qualification  does  not  necessarily  hold 
for  EMP  (for  example  at  the  origin  for  f^=  x^,  fg=  -  x^  ,  and  f  =  -  x^). 

We  therefore  include  this  assumption  in  the  statement  of  the  second-order 
necessary  condition. 

Second-order  necessary  condition. 

* 

If  x  is  a  local  minimum  of  EMP  and  the  second-order  constraint 
qualification  holds,  then  Z  W(x,X)Z  ,  the  projected  Hessian  of  the 
Lagrangian  function,  is  positive  semi -definite. 

Sufficient  condition. 

* 

If  the  first-order  necessary  condition  holds  at  x  ,  the  Lagrange 

multipliers  are  all  strictly  positive,  i.e.  \  >  0  ,  and  Z  w(x,\)Z  is 

* 

positive  definite,  then  x  is  a  strong  local  minimum  of  problem  EMP. 

* 

Thus  in  terms  of  problem  MMP,  F.,(x)  <  F„(x)  for  all  x  such  that 
*  MM 

|x  -  x|  <  6  ,  for  some  fi  >  0  . 

Note  that  in  the  case  where  all  the  f.  are  linear  it  is  well  known 

1  * 

that  a  solution  must  exist  with  n+1  active  functions  at  x  (see  Cheney  (l966) 

for  the  case  f^P).  Then  normally  Z  is  null  and  therefore  the  second- 

order  conditions  are  also  null.  The  nonlinear  problem,  however,  can  have 

* 

a  unique  solution  with  anything  from  1  to  n+1  functions  active  at  x  . 

This  relationship  is  exactly  analogous  to  that  between  linear  and  nonlinear 
programming.  For  coranents  on  the  special  case  of  l  approximation  and 
the  meaning  of  the  Haar  condition,  see  Section  11. 
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2. 


Clearly  it  is  desirable  that  at  every  iteration  the  search  direction 

p  be  a  descent  direction  for  F  ,  i.e. 

M 

p)  <  o  ,  where  Ffa(x^ ,  p)  is  the 

directional  derivative  lijn  ^  (f(x^+  hp)  -  f(x^ ) ) 

h-4  0  “ 

An  equivalent  condition  is  that  p  is  a  descent  direction  for  each 

function  f±  for  which  f±(x^)  =  lj|x^)  (i.e.  c^x^)  =  o  )  . 

A  second  desirable  property  for  p  arises  from  considering  the  active 

set  which  consists  of  those  constraints  corresponding  to  functions  we 

*  * 

expect  to  have  the  value  F|x)  at  x  .  We  wish  to  choose  p  so  that 

the  first  order  change  in  these  functions  predicts  that  they  will  all 

-(k)  - 

have  the  same  value  at  x'  +  p  .  An  equivalent  condition  is: 


a(x(*Vp -.;(*<*>) 

and  hence 

Jis<lt>>  *  ^Glk))TP  -  f W)  *  Iwl, 

for  some  value  pn+1  (  the  (n+l)st  component  of  p  ).  If  the  active  set 


(2.1) 


included  all  the  constraints  which  are  sero  at  x^ 


>  then  the  condition 


also  ensures  that  p  is  a  descent  direction  for  F^  .  in  fact,  at 
every  iteration  the  active  set  will  initially  include  all  such  constraints, 
but  it  may  be  desirable  to  drop  one  or  more  of  them  from  the  set  to  move 
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off  a  constraint.  Since  the  decrease  in  Fw  is  limited  to  the  smallest 
decrease  in  any  of  the  functions  corresponding  to  c^x^)  =  0  ,  it  is 

also  desirable  to  insist  that 

ai(x^bT  p  >  0  for  all  i  such  that  c^(x^)  =  0  .  (2. 

Conditions  (2.2)  and  (2.3)  ensure  that  p  is  a  first-order 
feasible  descent  direction  with  respect  to  problem  EMP.  It  is 
straightforward  to  show  the  following  from  the  above  remarks. 

Theorem  1.  If  (2.2)  and  (2.3)  hold,  i.e.,  p  is  a  first-order 
feasible  descent  direction  for  EMP,  then  p  is  a  descent  direction 
for  F^  and  hence  a  sufficiently  small  step  along  it  must  result  in 
a  reduction  in  F,_  . 

Note  that  the  above  does  not  guarantee  that  p  is  a  feasible 
direction  for  EMP,  as  illustrated  by  Figure  1.  This  causes  no 

(V+T  )  _ (]r)  . 

difficulty  since  xn+^  is  set  to  F(x  )  and  hence  it  is 

always  possible  to  obtain  a  lower  feasible  point  for  EMP  if  (2.2) 

and  (2.3)  hold.  Consequently  it  is  important  to  look  for  a  reduction 

in  F  in  the  line  search  and  not  in  a  penalty  function  constructed 
M 

for  EMP.  In  Figure  1, 


F  (x^k) 
FM'  1 


P- 


)  < 


F  (x^) 
*M  ’ 


but  a  penalty  function  may  have  a  much  higher  value  at  x  +  p 
than  at  x^  since  x^+  p  is  infeasible  for  EMP  . 
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1 


Example  for  n=m=l  where  p  is  first-order  f€ 
but  not  feasible  for  EMP. 

FIGURE  1  . 
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Thus  we  see  that  the  importance  of  EMP  is  as  a  device  to  obtain  a 
search  direction  p  along  which  can  be  reduced  in  the  line  search. 

We  emphasize  again  that  we  wish  (2.2)  and  (2.3)  to  hold  so  that  p  is  a 
descent  direction  for  FM  ,  and  that  the  active  set  nature  of  the 
algorithm  indicates  that  (2.l)  should  also  hold.  It  is  not  reasonable 
to  carry  (2.1)  one  step  further  and  demand  that  the  second  order  change 
in  the  active  functions  predict  they  all  have  the  same  new  value,  since 
this  would  require  computing  and  storing  each  of  the  Hessians  of  the 
active  functions,  a  process  which  is  unwarranted  especially  since  it  may 
not  be  possible  to  solve  the  resulting  equations  for  p  .  It  is  the 
case  however  that  (2.l),  (2.2)  and  (2.3)  will  usually  not  uniquely 
define  p  ,  and  in  the  next  section  we  utilize  properties  of  EMP  to 
obtain  an  initial  choice  of  p  by  solving  a  quadratic  program  (qp)  based 
on  second  order  information  incorporated  in  an  approximation  to  the 
Lagrangian  function.  The  solution  to  this  QP  may  not  always  satisfy  (2.l), 
(2.2)  and  (2.3)>  and  in  subsequent  sections  we  discuss  how  to  modify  the 
initial  choice  to  obtain  a  satisfactory  search  direction. 


3.  The  6P  Subproblem 

The  solution  of  EMP  is  at  a  minimum  of  the  Lagrangian  function  in 

* 

the  null  space  of  the  active  constraint  Jacobian  at  x. 

The  usual  method  for  solving  a  general  linearly  constrained  problem 
is  to  approximate  the  objective  function  by  a  quadratic  function  and 
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then  determine  the  search  direction  by  solving  some  appropriate 
quadratic  program  (CP).  Consider  therefore  the  quadratic  program 

min  L(*(k),  x(k))  ♦  (e  -  i(x(k> )X (k) )Tp  * 

P 

|  PT  W(x(k),x(k>)p 

subject  to  A(x^)Tp  =  -  c(x^)  . 

f  k)  -#■ 

where  X  is  an  approximation  to  A. 

An  equivalent  QP  is  given  by 

QP1  :  min  |  pT  W(x(k),X (k) )p  +  e  Tp 
P 

subject  to  A(x^)Tp  =  -  c(x^)  . 

(k)  (k)  ~ 

Let  us  drop  the  arguments  x  '  and  A  from  A  and  W,  and 
let  Y  and  Z  be  the  orthogonal  matrices  defined  in  Section  1.1. 
The  matrices  Y  and  Z  may  be  determined  from  the  QR  factorization 

A 

of  A; 

A 

a  =  q  r  R 

L° 


to 

SH 

II 

1 - 

K 

1 _ 

- 1 

o 

- _ 1 
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where  R  is  an  upper  triangular  matrix  of  order  t  .  If  A  has  full 
T 

rank  and  Z  WZ  is  positive  definite,  then  the  unique  solution  of  QP1 
can  be  expressed  as  the  sum  of  two  orthogonal  components: 


We  have 


p  =  Ypy  +  Zpz 

where  p„  e  and  p„  e  Rn+^_t 

JL 

Am  m  /v 

A  p  =  R  pY  =  -  c 


(3-1) 


(3-2) 


and  p  Y  is  determined  entirely  by  the  constraints  of  QP1. 

The  vector  pz  is  given  by  the  solution  of 

(ZTWZ)pz  =  -  ZT(en+1  +  WYpy)  (3.3) 

(see  Murray  and  Wright  (l978)). 

In  subsequent  sections  we  will  also  wish  to  refer  to  a  related  Qp 
and  its  solution,  namely  the  one  with  the  same  quadratic  form  but  homo¬ 
geneous  constraints: 

QP2  :  min  |  pT  Wp  +  en+1Tp 

P 

subject  to  A(x^)Tp  =  0  . 

The  solution  to  this  is  given  by  p  =  Zqz  ,  where 

(ZTWZ)qz  =  -  ZTen+1  .  (3.4) 


\ 
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At  every  iteration  of  our  algorithm  an  attempt  is  made  to  set  the 

search  direction  p  to  the  solution  of  QP1,  but  for  various  reasons 

this  may  be  inadequate  (there  may  not  even  be  a  solution).  Much  of  the 

detailed  discussion  of  the  method  is  concerned  with  what  action  to  take 

in  these  circumstances.  It  is  important  to  realize  that  it  is  only  on 

attempting  to  solve  the  QP  that  these  inadequacies  are  revealed  and  if 

the  solution  is  not  sought  in  a  particular  manner,  certain  deficiencies 

are  not  ascertained.  We  are  restricted  in  the  action  we  may  take  by 

requiring  that  the  search  direction  satisfy  (2.l),  (2.2)  and  (2.3),  and  by 

a  need  to  limit  the  computational  effort  when  the  information  on  which 

it  is  based  has  proved  suspect.  We  also  wish  to  arrange  the  computation 

so  that  any  computational  efforts  already  invested  can  still  be  utilized 

should  the  initial  QP  prove  inadequate.  In  particular,  provision  must 

be  made  for  the  possibility  that  the  wrong  active  set  is  identified  at 
(k) 

x  .  We  will  always  insist  that  at  the  beginning  of  every  iteration 

the  potential  active  set  includes  all  constraints  with  zero  value  at 
(k) 

xv  (and  it  will  normally  also  include  constraints  with  positive  values). 
In  Section  5*2  and  6  we  discuss  how  a  constraint  may  then,  if  desired,  be 
deleted  from  the  active  set. 

For  the  moment  we  assume  that  analytical  Hessians  are  used  to  compute 

W  ,  but  in  Section  7  we  discuss  finite  difference  and  quasi-Newton 

alternatives.  Notice  that  the  definition  of  W  requires  an  estimate  of 

#  (k) 

the  Lagrange  multipliers  X  to  be  made  available  at  x  .  Before 
discussing  these  other  matters,  we  devote  the  next  section  to  the  subject 

of  multiplier  estimates. 
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U.  Lagrange  Multiplier  Estimates. 

Although  the  arguments  of  Section  3  amply  justify  defining  the 

basic  search  direction  p  as  the  solution  to  QP1,  it  is  not  nearly 

(k) 

as  clear  how  to  define  the  Lagrange  multiplier  estimate  at  x 
These  estimates  are  needed  to  define  the  matrix  W  and  to  determine 
whether  constraints  should  be  deleted  from  the  active  set.  Clearly 
it  is  better  to  use  new  information  obtained  at  the  current  point  x 
rather  than  use  the  multipliers  of  the  QP  solved  at  the  previous 
iteration. 

\T :  The  least  squares  estimate. 

Jj 


(k) 


The  most  obvious  multiplier  estimate  is  the  least  squares  solution 
to  the  overdeteimined  system  based  on  the  first  order  necessary  con¬ 
ditions.  Thus  we  define  Xl  te  the  solution  of  the  least  squares 
problem 


“in  l|A  X  -  en+1ll2  • 

X 

It  is  well-known  (see  Golub  (.1965)  )  that  the  least  squares  problem  can 

A 

be  solved  by  using  the  QR  factorization  of  A  ,  which  we  introduced  in 
the  last  section  to  solve  QPl.  The  estimate  \L  is  called  a  first-order 
multiplier  estimate  because 

hL-  gi  -  °<ii*<k)-  *jd 


where 

active  set. 


is  the  minimum  of  EMP  on  the  manifold  defined  by  the  current 
i.e.,  the  solution  of 

/v 

min  xr+^  subject  to  c(x)  =  0 


l6 


and  \M  is  the  corresponding  Lagrange  multiplier  vector. 

The  multiplier  estimate  has  the  following  property.  Suppose  for 
some  reason  p  is  unsatisfactory  and  we  wish  to  delete  a  constraint  from  the 
active  set.  If  (\L)j  <  0  and  we  delete  constraint  j  ,  then  the  steepest 
descent  direction  in  the  nullspace  of  the  new  active  constraint  Jacobian 
is  guaranteed  to  be  first-order  feasible  with  respect  to  the  deleted  con- 

~  A  A 

straint.  Define  A  as  A  with  a.  deleted,  and  Z  by 

tJ 

K  -  0 ,  71.  i .  [Z  <)  .  (U.D 

Then  the  steepest  descent  step  in  the  new  null  space  is  given  by 

1  k  -  -*?vi 

and  we  have 

A  m 

aj  Zs“>0 

The  Newton  step  in  the  null  space,  given  by  Z  qj  ,  where 

(Fw  FqJ  •  - 

does  not  in  general  satisfy 


See  Gill  and  Murray  (l9J9)  for  the  proof  of  these  statements  in  the 
context  of  linearly  constrained  optimization. 

\c :  A  special  estimate  for  problem  EMP. 

A  more  appropriate  estimate  than  \  can  be  obtained  by 
considering  the  special  structure  of  EMP.  The  least  squares  solution 


(4.2) 

(4.3) 

(4.4) 
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is  motivated  by  the  fact  that  the  overdetermined  set  of  equations 


A  X  =  en+1  is  only  an  approximation  to  the  set  of  equations  which 
hold  at  ,  the  minimum  on  the  manifold.  However,  the  (n+l)st 

/«.rp 

equation  e  \  =  1  is  exactly,  not  approximately,  the  equation  which 
holds  at  Xjj  .  We  therefore  define  another  multiplier  estimate  xc 
as  the  least  squares  solution  to  the  first  n  equations  subject  to 
the  constraint  e  x  =  1  .  Thus  xc  is  the  solution  to  the  con¬ 
strained  least  squares  problem 

min  || V  Xf  subject  to  e\  =  1  • 

X 

In  fact  we  can  show  that  \  is  exactly  xL  multiplied  by  a  scalar 
greater  than  or  equal  to  one. 

A 

Theorem  2.  Assume  A  has  full  rank  and  let  XT  and  \  be  defined 
-  L  C 

as  above.  Then  XQ  «  3  xL  >  where  3  =  >  1  . 

®  XL 

Proof.  If  V  does  not  have  full  rank  then  let  \  ^  0  be  such  that 

V  x  =  0  .  We  have  e i  /o  since  otherwise  A  X  =  0  .  Thus 

X„  =  XT  =  7s~"  ■  X  with  both  the  least  squares  and  the  constrained 

c  L  e  X 

least  squares  problems  having  zero  residuals,  and  the  result  follows. 

A 

Therefore  assume  that  V  has  full  rank.  The  vector  X  satisfies 

li 

.  /Ctc-i-1  :t 
XL  =  (A  A)  A  en+L 


/s  Am  1  A 

=  (VAV  +  ee  )  e 


It  follows  from  the  Sherman-Morrison  formula  (see  Householder  (l96k, 
p .  125)),  that 
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1  *T 

XT  =  d - =sr  d  e  d 

L  1+e  d 


Am 

l+eAd 


(4.6) 


where  AfpA  i  a 

d  =  (v  v)"e 

Note  that  eTd  >  0  since  (V^V ) is  positive  definite. 

Since  X  is  the' solution  to  a  constrained  optimization 
problem  it  follows  from  the  corresponding  first  order  necessary 
conditions  (writing  ||V  \JJ2  =  X^V  X)  that 


(4.7) 


2  VAV  X  c  =  pxe 

where  is  the  Lagrange  multiplier  associated  with  the  constraint 

**m 

e  X=  1  .  Since  e  d  >  0  and  e  Xc  =  1  we  have 


Xc  d 

e  d 


and  hence  the  result  follows.  Alternatively  the  equation  for  Xc  could 

have  been  obtained  by  scaling  the  last  equation  in  the  unconstrained 

least  squares  problem  and  observing  that  the  right  hand  side  of  (4.6), 

modified  to  include  the  scale  factor,  tends  to  d  as  the  scale  factor 

e  d 

goes  to  infinity.  Q 


Note  that  it  would  be  highly  ill-advised  to  compute  X  and  X~ 

Li  0 

(k) 

by  computing  d  via  (4.7).  If  x'  were  equal  to  x^  ,  the  minimum 

A  A 

on  the  manifold,  then  V  would  be  rank-deficient,  but  A  would  not 


19 


(k) 

in  general,  and  hence  if  x  is  close  to  the  condition  number 

of  V  V  may  be  much  bigger  than  that  of  A  A  .  Since  solving  the 

A 

least  squares  problem  using  the  QF  factorization  of  A  is  alrady  a 

AfflA 

better  conditioned  process  than  explicitly  using  A  A  (via  the 

A 

normal  equations)  it  is  clear  that  using  the  QF  factorization  of  A 
is  far  preferable  to  using  (4.7). 

Using  the  scaled  estimate  X  instead  of  \  will  result  in 

0  Jj 

different  decisions  about  whether  to  delete  constraints  from  the 
active  set  since,  as  will  be  explained  in  Section  6,  the  magnitudes 
as  well  as  the  signs  of  the  estimates  are  used  to  make  the  decision. 

Furthermore,  using  X  instead  of  X  in  general  increases  the 

C  1/ 

A 

magnitude  of  W  and  hence  affects  both  the  direction  (if  c  f  o) 
and  the  magnitude  of  the  solution  to  QP1.  The  following  example 
illustrates  that  it  may  often  be  beneficial  to  use  X  rather  than 

1j 

Xp  .  Let  n  =  m  =  1  and  F(x)  =  f^(x)  =  x^  .  Let  the  current 

—  (k)  ~  A  u 

estimate  of  the  solution  be  x  =  2  .  Then  V  =  4  and  A  =  [^  ]  , 

X^  =  ~  and  Xc  =  1  .  Using  Xc  for  W  results  in  the  exact  step 

to  the  solution  being  taken,  but  using  X^  results  in  one  seventeen 

times  too  big.  Clearly  similar  examples  can  be  constructed  with  a 

larger  number  of  active  constraints.  The  choice  of  X  over  XT 

essentially  arises  from  the  fact  that  problem  MMP  is  in  some  sense 

naturally  scaled  —  the  functions  of  MMP  cannot  be  individually  scaled 

without  changing  the  solution  while  the  constraints  of  NCP  can  be  in¬ 


dividually  scaled  in  general. 
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XjjJ  A  second  special  estimate. 

The  estimate  X  is  not  the  only  multiplier  estimate  that  might 

C 

be  constructed  to  take  the  special  form  of  EMP  into  account.  Since  we 
-  (k) 

know  that  V  would  be  rank-deficient  if  x  were  the  minimum  on  the 
manifold,  a  reasonable  alternative  would  be  to  assume  that  V  is  close 
to  being  rank-deficient  and  define  X^  as  the  solution  to  an  appropriate 
rank  deficient  least  squares  problem,  scaled  so  that  e  XD  =  1  «  Let  the 

A 

QR  factorization  of  V  with  column  pivoting  be 


.i;]. 

where  Q^  is  an  orthogonal  matrix,  II  is  a  permutation  matrix,  s  is 
a  vector  and  r  is  a  scalar.  If  t  =  0  then  the  vector 

,■ . .  j-v.-j 

and  its  multiples  are  solutions  to  the  rank-deficient  least  squares 

A 

problem  VX  ca  0  .  Therefore  let 

i 

e  X 


The  estimate  X_  is  not  a  multiple  of  XT  in  general  and  if  a 
constraint  is  deleted  corresponding  to  (XD)j  <  0  ,  it  is  not  true  that 
the  steepest  descent  step  satisfies  (4.3).  This  alone  makes  the  use  of 
Xp  questionable.  Furthermore,  to  obtain  X^  it  would  be  necessary  that 
the  QR  factorization  of  V  be  done  with  column  pivoting  (otherwise  it 
makes  no  sense  to  ignore  j)  and  this  means  it  could  not  be  obtained 
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by  simply  updating  the  factorization  of  A  .  The  main  flaw  of  is 

A 

that  it  ignores  the  information  given  by  T  even  though  V  may  not 
be  nearly  rank  deficient.  For  these  reasons  we  do  not  consider  xD  *ny 
further. 


:  A  second-order  estimate. 


The  second-order  Lagrange  multiplier  estimate 


is  defined  as 


the  exact  multipliers  corresponding  to  the  solution  p  of  QP1,  i.e.  the 


solution  to  the  consistent  set  of  equations 


%  ‘  Vi  *  “p  • 


(4.7) 


where  p  is  given  by  (3- l) >  (3*2)  and  (3- 3 )  and  a  first-order  multi¬ 
plier  estimate  is  used  to  define  W  ,  The  necessity  of  requiring  W 
to  define  ^  implies  that  second-order  estimates  can  only  be  useful  in 
determining  whether  to  delete  constraints  from  the  active  set.  The 
estimate  is  called  second-order  because 


iii*  -  gi  -  o  (ii*(k)  -  */) 


If  a  constraint  is  deleted  corresponding  to  (n^.  <  0  >  then  (4.3)  will 

A 

not  hold  in  general.  If  c  =  0  ,  then  (4.5)  will  hold. 

The  system  (4.7)  is  consistent  because  of  the  definition  of  p  . 

Hote  that  because  it  is  consistent  there  is  no  question  of  a  second-order 
estimate  analogous  to  Xc  -  -  the  last  equation  is  already  satisfied  by  ^  . 
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:  A  first-order  estimate  guaranteeing  a  first-order  feasible 
Newton  step  after  deletion. 

The  multiplier  estimates  XL  and  ^  are  often  used  in  constrained 
optimization  and  are  discussed  in  some  detail  in  Gill  and  Murray  (l97 9). 
Here  we  note  that  there  is  yet  another  estimate  which  might  be  useful 
both  for  EMP  and  in  the  context  of  general  constrained  optimization. 

We  denote  it  vw  and  define  it  by 

H  -  Vi  *  m,h  ' 

where  Zqz  is  the  solution  to  QP2  as  defined  by  (3.4).  The  estimate 
Vy  is  the  vector  of  exact  multipliers  corresponding  to  the  solution  of 
QF2.  Unlike  the  case  with  XL  or  ,  if  a  constraint  is  deleted 

corresponding  to  a  negative  component  of  vw  ,  then  (4.5)  must  hold  even 
if  c  f  0  ,  i.e.  the  Newton  step  in  the  new  null  space  must  be  first- 
order  feasible  w.r.t.  the  deleted  constraint.  This  fact  follows  from 
the  proof  of  Theorem  4  in  Gill  and  Murray  (l979)>  where  (4.5)  is  shown 
to  hold  in  the  context  of  linear  constraints.  The  proof  is  applicable  to 
but  not  p^  in  the  context  of  nonlinear  constraints,  since  vw  is 
the  exact  multiplier  vector  for  a  QP  with  homogeneous  constraints. 

Notice  however  that  unlike  ^  is  only  a  first-order  estimate. 

Using  the  estimate  Xc  to  define  W  . 

Both  the  estimates  Xc  and  ^  will  be  used  to  decide  when  to 
delete  constraints  from  the  active  set,  as  will  be  discussed  in  Section  6. 
As  explained  there,  a  constraint  with  a  negative  component  of  Xc  will  not 
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necessarily  be  deleted  from  the  active  set,  since  the  multiplier 


estimates  may  not  be  reliable.  However,  we  also  use  \ c  to  define  W 
and  there  is  no  good  reason  to  include  in  W  a  term  with  a  negative 
component  of  X~  •  Therefore,  we  define  W  to  be 

w(x^)  ^(*(k)) 

i=l 


where 


eTX" 


X" 


and  is  defined  by  =  max(o,  (xc)i)  • 
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5-  Properties  of  Solution  of  QP  Subproblem. 


In  this  section  we  examine  the  properties  of  the  solution  to  QP1. 

Initially  we  assume  that  all  constraints  with  zero  value  are  included 

*  T 

in  the  active  set  and  that  A  has  full  rank  and  Z  WZ  is  positive 
definite  so  that  the  solution  p  is  given  by  (3*l),  (3-2)  and  (3.3)  and 
is  unique.  We  would  like  p  to  satisfy  (2.1),  (2.2)  and  (2.3)  • 

Clearly  the  constraints  of  QP1  ensure  that  (2.l)  tnd  (2.3)  hold.  Thus 
the  only  question  is  whether  p  is  a  descent  direction  for  EMP,  i.e. 
whether  (2.2)  holds.  If  all  the  active  constraints  have  the  value  zero 
then  the  following  applies: 

^  ^  IJI 

Theorem  3-  Suppose  that  c  =  0  ,  A  has  full  rank  and  Z  WZ  is  positive 
definite.  Then  p  ,  the  solution  of  QP1,  is  a  descent  direction  for 
EMP  provided  it  is  not  zero. 


Proof.  Since  A  has  full  rank  and  the  columns  of  Y  span  the  range  of 

A 

of  the  columns  of  A  ,  we  have  p^  =  0  .  Hence  p  =  Zpz  and 


T„T 
P7  Z  e 


h+1 


T  T 

=  -  PzVwzpz 


by  (3.3) 


T  T 

Since  Z  WZ  is  positive  definite,  e  _  p  must  be  negative  if  p  /  0, 

n+1  Lx 

i.e.  p/  0  .  Q 

fjl  A 

If  p  =  0  ,  then  by  (3-3)  z  en+i  =  0  511(1  hence  A  xL  =  en+1  is 
a  consistent  set  of  equations,  with  \  =  =  %  =  XM  •  Thus  either 

one  of  the  components  of  \  is  negative  or  zero,  or  the  first  and  second 

v 

(k) 

order  sufficiency  conditions  are  satisfied  and  x  is  a  solution 
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of  problem  EMP.  If  p  =  0  and  at  least  one  of  the  multipliers  is 

negative,  then  it  is  necessary  to  delete  a  corresponding  constraint 

from  the  active  set  to  obtain  a  descent  direction.  The  procedure 

for  doing  this  is  described  in  Section  6.  If  p  =  0  and  the  smallest 

(k) 

component  of  \r  is  zero,  the  point  x  may  or  may  not  be  a 
solution.  In  this  case  special  techniques  such  as  described  in  Gill  and 
Murray  (1977)  must  be  used  to  determine  whether  to  treat  the  correspon- 
ing  constraint  as  active  or  not.  These  will  not  be  discussed  any  further 
here. 

In  the  next  three  subsections  we  discuss  the  actions  which  it  may 
be  necessary  to  take  to  obtain  a  search  direction  which  satisfies  (2.l), 

A  A 

(2.2)  and  (2.3)  when  we  drop  the  assumptions  that  c  =  0  ,  A  has  full 

T 

rank  and  Z  WZ  is  positive  definite. 

5.1  Positive  Active  Constraints. 

A 

In  practice  it  will  rarely  be  the  case  that  c  =  0  ,  so  we  now  drop 
this  assumption.  We  note  that  if  we  were  sufficiently  restrictive  in 
the  definition  of  the  active  set  (e.g.  choose  only  one  constraint  active) 
then  we  could  force  this  condition  to  be  true.  As  will  be  shown  in 
Section  10, however,  it  is  important  for  the  efficiency  of  the  algorithm 
not  to  be  too  restrictive  in  the  definition  of  the  active  set.  This  may 
appear  to  negate  the  significance  of  Theorem  3,  but  this  is  not  the  case. 
Although  dropping  the  assumption  certainly  means  that  Theorem  3  no  longer 
holds, since  both  Yp^  and  Zpz  could  be  ascent  directions , we  can  infer 

A 

that  if  ||c||  is  small  (and  it  approaches  zero  near  the  solution),  then  p 
is  likely  to  be  a  descent  direction.  If, having  solved  QPl,it  transpires 


T 

that  p  >  0  ,  then,  provided  either  YpY  or  Zpz  is  a  descent 

direction,  a  suitable  descent  direction  can  be  chosen  as  follows: 

YPy  *"  V  ZPZ  if  Vl^Y  <  ° 

Y  fPy  ♦  ZPZ  if  en_iTZpz  <  0 

for  some  y  satisfying  0  <  y  <  1  .  If  neither  component  is  a  descent 
direction  then  a  further  possibility  is  to  replace  Zpz  by  the  solution 

m 

of  QE2,  i.e.  Zqz  where  qz  is  given  by  (3.4)  .  Provided  z  en+1r  0, 
the  vector  Zq„  is  a  descent  direction,  and  hence  so  is  Yyp  +  zq„  for 

U  1  Li 

some  Y,  o  <  Y  <  1  . 

T 

When  Z  en+j  =  0  and  Ypy  is  an  .scent  direction,  it  is  necessary 
to  delete  a  constraint  to  obtain  a  descent  direction.  One  possibility 

A 

would  be  to  delete  constraint  j  (say)  where  is  the  largest  component 

A 

of  c  ,  since  it  is  not  strictly  necessary  to  be  concerned  about  whether 

the  resulting  search  direction  has  a  positive  inner  product  with 

((2.3)  applies  only  to  constraints  with  value  zero).  However,  clearly 

this  may  lead  to  a  very  small  step  being  taken  along  the  search  direction 

with  this  constraint  being  forced  immediately  back  into  the  active  set. 

The  following  result  shows  that  when  YpY  is  uphill,  a  constraint  can 

always  be  found  with  a  negative  multiplier  estimate  and  hence  can  be 

T  / 

deleted  more  safely.  The  result  is  also  useful  when  Z  ®n+^r  0  since  then 
the  fact  that  Ypy  is  uphill  implies  there  are  too  many  constraints  in 
the  active  set. 

A 

Theorem  4.  Assume  A  has  full  rank  and  let  YpY  be  defined  by  (3*2). 
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T 

If  en+1  YpY  >  0  ,  then  one  of  the  components  of  Xc  is  negative. 

Proof.  We  know  Xc  is  a  positive  multiple  of  X^  •  The  vector  xL 
satisfies 

*  -  ^Vl  (5-1’ 


This  is  a  characterization  of  the  least  squares  solution  using  the 
projector  matrix  YY^  (see  Stewart  (l973 >  p.  228)).  Thus 


r.  A  YPv  = 


"n+1 


T  T 
YY±Ypy 


and  hence 

m 

"  XL  C  =  ®n+l  YpY  >  0  ' 

Since  c  >  0  it  follows  that  at  least  one  of  the  components  of  \r 

Jj 

and  hence  X  must  be  negative.  Q 
L/ 

T 

It  also  follows  from  the  above  proof  that  if  efi+1  YpY  =  0  ,  then 
either  c  =  0  (covered  by  Theorem  3)  ,  or  the  minimum  element  of  X  is 
zero  or  negative. 

It  is  worth  noting  that  Theorem  4  does  net  hold  in  general  if  other 
multiplier  estimates  such  as  x^  and  ^  are  substituted  for  xc  • 

It  follows  from  Theorem  4  that  If  Ypy  is  an  ascent  direction,  we 
can  delete  the  constraint  corresponding  to  a  negative  component  of  \ 

v 

to  obtain  a  first-order  feasible  descent  direction.  The  procedure  for 
doing  this  is  described  in  Section  6. 


5-2  Avoiding  Rank  Deficiency  in  the  Active  Contraint  Jacobian. 

In  this  section  we  demonstrate  how  the  active  set  can  always  be 
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chosen  so  as  to  avoid  rank  deficiency  in  A  .  We  first  consider  the 

A  A 

consequences  of  A  being  rank  deficient.  If  A  is  rank  deficient  and 
c  /  0  ,  it  is  not  possible  in  general  to  satisfy  the  constraints  of 

QP1  since  they  may  not  be  compatible.  In  such  circumstances  it  might 
be  thought  that  an  adequate  compromise  would  be  the  least  squares  solution 
to  A  p  »  -  c  ,  but  this  may  not  be  first-order  feasible  with  respect 
to  EMP  and  hence  may  not  be  a  descent  direction  for  the  mini max  function 
even  if  it  is  a  descent  direction  for  EMP.  Clearly  it  is  desirable  to 

A 

restrict  the  number  of  constraints  in  the  active  set  so  that  A  has 
full  rank. 

The  way  this  is  done  is  as  follows.  Given  a  set  of  candidates  for 

the  active  set,  we  determine  which  are  to  be  actually  included  in  the 

active  set  during  the  Qft  factorization  of  A  .  Assuming  the  problem  is 

well-scaled,  a  reasonable  order  in  which  to  consider  the  candidates  for 

inclusion  is  given  by  the  size  of  the  constraint  values.  Therefore  we 

order  the  potential  columns  of  A  by  increasing  size  of  fc.}  before 

J 

proceeding  to  do  a  QR  factorization  of  the  matrix  by  columns  without 

column  pivoting  (except  where  several  columns  correspond  to  the  same 

magnitude  of  c.).  If  it  transpires  during  the  factorization  that  any 
3 

A 

of  the  potential  columns  of  A  is  linearly  dependent  on  those  already 
included,  then  the  corresponding  constraint  is  not  included  in  the  active 
set.  Clearly  such  a  process  results  in  a  matrix  A  that  has  full  column 
rank. 

An  example  which  illustrates  this  procedure  is  the  following. 

Suppose  the  initial  candidates  for  the  active  set  are 
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- 

— 

0 

1 

1 

0.5 

0 

10-4 

0 

0 

0.5 

1 

10-5 

tod  A  = 

0 

0 

0 

0 

10"2 

1 

1 

1 

1 

• 

— 

— 

Then  the  first  and  third  constraints  are  selected  for  the  active  set, 
tod  the  second  and  fourth  are  ignored. 

We  must  now  show  that  omitting  any  constraint,  say  c  (x)  ,  from 

r 

A 

the  active  set  to  avoid  rank  deficiency  in  A  does  not  cause  (2.3)  to 
be  violated.  Strictly  speaking,  we  need  not  be  concerned  with  constraints 
for  which  cr(x)  >  0,  but  for  the  moment  we  will  not  assume  Cf(x)  =  0 
since  the  following  also  serves  to  illustrate  why  we  put  potentially 
active  constraints  in  increasing  order. 

Let  a  be  the  gradient  of  c  .  We  have 
r  r 


r  ’£  V1  \ 
i=l 


(5.2) 


for  some  index  s  and  scalars  w.^  ,  i  =  1, ...,s  .  Since  the 
constraints  were  ordered  we  also  have 


A  A 


c  >  C  >  C  ->...>  c, 
r  —  s  —  s-1  —  —  1 


(5.3) 


It  follows  from  (3.2)  and  (5.2)  that 


T 

ar  p 


■  -I  Vi 


(5.4) 


i=l 


f 
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Since  we  have  no  a  priori  information  about  how  the  sizes  of  the  fw^} 
would  change  if  the  columns  were  ordered  differently,  we  have  by  putting 

the  constraints  in  increasing  order  attempted  to  prevent  |arTp|  from 

T 

being  large  and  in  particular  to  prevent  af  p  from  being  a  large 

negative  number.  Ordering  the  constraints  also  ensures  that  the  omitted 

constraint  c  have  as  large  a  value  as  possible,  and  these  two  facts 
r 

combine  to  make  it  as  unlikely  as  possible  that  constraint  r  will 
prevent  a  significant  step  from  being  taken  along  p  .  It  follows  from 
(5.3)  and  (5.4)  that  if  cr  =  0  ,  then  ar  p  =  0  and  hence  (2.3)  is 
satisfied.  However, if  it  is  necessary  to  delete  a  further  constraint 
from  the  active  set  as  described  in  Section  6,  then  (2.3)  may  no  longer 
hold.  This  difficulty  is  circumvented  by  allowing  a  zero  step  to  be 
taken  along  p  (see  Section  8)  and  then  reconsidering  the  choice  of 
active  set  in  the  next  iteration. 

5.3  Projected  Hessian  not  Positive  Definite. 

T 

If  Z  WZ  is  not  positive  definite,  it  makes  no  sense  to  use  the 
solution  of  (3*3),  even  if  it  exists,  to  define  the  direction  of  search. 
Such  a  direction  is  a  step  to  a  stationary  point  which  is  not  a  minimum 
of  the  quadratic  form  on  the  subspace  defined  by  the  constraints  and  may 
even  be  a  maximum.  It  also  makes  no  sense  to  reduce  the  number  of  active 
constraints  (the  projected  Hessian  will  still  be  indefinite)  since  one 
means  of  ensuring  that  the  projected  Hessian  is  positive  definite  is  to 
increase  the  number  of  active  constraints.  An  alternative  definition  of 
the  search  direction  which  is  satisfactory  is  to  utilize  the  modified 
Cholesky  algorithm  of  Gill  and  Murray  (1974). 
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The  following  numerically  stable  matrix  factorization  is  computed: 

m  T 

Z  WZ  +  E  =  LDL 

The  matrix  E  is  a  non-negative  diagonal  matrix  large  enough  to  make 
T 

Z  WZ  +  E  numerically  positive  definite.  The  modified  Newton  direction 

A 

in  the  null  space  of  A  is  then  defined  as  Zq^  where 

LDLTqz  =  -  ZTen+1  .  (5.5) 

Recently  several  more  complicated  methods  have  been  suggested  for  handling 
indefinite  Hessians  ;  see  Fletcher  and  Freeman  (1977),  and  More  and  Sorenson 
(l979).  All  of  these,  including  the  Gill  and  Murray  algorithm,  provide 
ways  of  obtaining  directions  of  negative  curvature.  As  will  be  explained 
shortly,  however , these  are  not  so  useful  in  the  context  of  nonlinear 
constraints; our  main  concern  here  is  simply  to  obtain  a  reasonable  descent 
direction. 

Like  the  Newton  direction  Zq^  in  the  positive  definite  case,  the 

T  / 

vector  Zaz  is  a  descent  direction  provided  that  Z  en+^  f  0  .  If 

T  T 

Z  eQ+1  =  0  and  en+1  Yp^  <0  we  can  simply  set  p  =  Ypy;  if 
T  T 

Z  =  0  and  en+-^  Yp^  >0  we  have  already  explained  that  a  constraint 

T 

may  be  deleted.  Similarly  if  Yp^  =  0  ,  but  a  component  of  is 

negative,  a  constraint  may  be  deleted. 

T  T 

If  Z  WZ  is  not  positive  semi-definite  and  en+^  Yp.^  =  0  , 

T  (k ) 

Z  eQ+1  =  0  and  Xc  >  0  ,  then  the  point  x  is  a  constrained  saddle 
point  (or even  a  maximum).  It  is  desirable  to  compute  a  feasible  arc  of 
negative  curvature  with  respect  to  ,  which  must  exist.  For  the 

case  of  linearly  constrained  minimization  of  a  differentiable  function. 
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Gill "and  Murray  (1974)  have  shown  how  to  compute  a  vector  of  negative 
curvature  using  the  modified  Cholesky  factorization.  However,  it  is  not 
possible  in  general  to  compute  such  an  arc  for  the  nonlinearly  constrained 
case  given  only  W  —  it  is  necessary  to  know  all  the  individual  constraint 
Hessians.  Such  a  computation  would  hence  require  an  unreasonable  amount 
of  storage  as  well  as  computational  effort.  We  therefore  suggest  the 

A 

following.  If  c.  >  0  for  some  j,  delete  the  j-th  active  constraint 

u 

/\ 

and  avoid,  or  at  least  postpone,  the  problem.  If  c  =  0  ,  compute  the 

direction  of  negative  curvature  assuming  the  constraints  are  linear,  which 
is  thus  feasible  to  first  order,  and  try  stepping  along  it.  If  the  minimax 
function  is  lower  at  the  new  point, then  take  this  as  .  Otherwise 

an  alternative  is  to  try  to  obtain  a  lower  point  using  a  function-comparison 
method  such  as  described  in  Swann  (1972).  There  is  no  reasonable  procedure 
which  is  guaranteed  to  work  in  this  situation.  However,  it  should  be  noted 
that  such  a  situation  is  unlikely  to  occur  since  the  basic  modified  Newton 
iteration  of  solving  successive  quadratic  programs  seeks  to  avoid  saddle 
points  and  maxima. 


6.  Deleting  Constraints  from  the  Active  Set. 

T 

In  Section  5  we  explained  that  when  Z  e  ^  =  0  and  a  multiplier  is 
negative,  it  is  necessary  to  delete  a  constraint  from  the  active  set  in 
order  to  obtain  a  first-order  feasible  descent  direction.  It  is  well  known 


in  the  context  of  constrained  optimization  that  it  is  ill-advised 
T 

to  wait  until  Z  en+^  is  zero  or  very  close  to  zero  before  deleting  a 
constraint  corresponding  to  a  negative  multiplier  estimate,  since  doing  so 


is  solving  the  wrong  equality-constrained  subproblem  with  unnecessary 
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accuracy  (minimizing  on  a  manifold).  It  is  also  well  known  that  it  is 
even  more  ill-advised  to  delete  a  constraint  too  early,  when  the  multi¬ 
plier  estimates  are  not  yet  reliable,  since  this  may  cause  the  constraint 
to  be  repeatedly  added  to  and  dropped  from  the  active  set.  Gill  and 
Murray  (l979)  suggest  computing  both  X^  and  ^  when  possible,  never 
considering  deleting  a  constraint  unless  the  estimates  have  the  same  sign 

and  agree  within  a  certain  tolerance.  Here  we  use  Xc  instead  of  XL  • 

T 

Notice  that  Xc  =  ^  when  Z  en+.^  =  0  .  A  further  possibility  is  to 
insist  that  the  following  hold  before  deleting  a  constraint: 

||zTen+1||  <  6  '  min  (l,  -  min  (x^) 

where  6<  1  is  a  constant,  say  6=1.  In  effect,  this  test  ensures 
that  the  higher  the  uncertainty  that  a  multiplier  is  negative,  the  greater 
the  accuracy  to  which  the  minimum  is  approximated  on  the  manifold. 

In  Section  5-1  we  also  pointed  out  a  further  situation  in  which  we 
delete  a  constraint.  This  is  when  en+i  YPy  >  0  •  which  means  (as 

A 

shown  in  Theorem  4)  that  (xn) .  <  0  (and  c.  >  0)  for  seme  j  .  A 
constraint  is  always  deleted  in  this  situation  since  this  is  a  clear 
indication  that  there  are  too  many  constraints  in  the  active  set. 

In  the  rest  of  this  section  we  discuss  what  search  direction  p  to 
choose  after  deleting  constraint  j  with  (Xc)  j  <  0  .  Define  A,  Z  and 
z  by  (4.l).  Define  c  to  be  c  with  Cj  deleted.  Since  A  corresponds 
to  the  new  active  set  we  wish  p  to  satisfy 

a  I  t  r  «  \ 


0 


corresponding  to  (2.l),  and  also  p  f.  <  0  ,  i.e.  (2.2).  If  c.  = 
then  we  must  have 

aj  p  >  0  (6.2) 

to  satisfy  (2.3),  hut  as  explained  in  Section  5*1>  this  is  desirable  even 
if  \  >  0  * 

As  we  noted  in  Section  4,  the  steepest  descent  step  Zs~  in  the 
null  space  of  the  new  set  of  constraints,  given  by  (4.2),  must  satisfy 
(6.2),  but  the  Newton  step  given  by  (4.4)  may  not  satisfy  (6.2)  .  A  step 
satisfying  (6.2)  which  is  preferable  to  the  steepest  de  ^  ‘  step  is  a 
combination  of  the  Newton  step  in  the  null  space  of  A  ,  i.e.  Zq  ,  and 

Z 1 

the  steepest  descent  step  in  the  new  direction  permitted  by  deleting  the 
constraint. 

A. 

Theorem  3.  Assume  A  has  full  rank. 

Suppose  <  0  •  Define  A,  Z  and  z  as  in  (3>l).  Let  r~  be 

defined  by 


where  qz  is  defined  by  (5-5).  Then  p  =  Zr~  satisfies  (6.l),  (2.2) 
and  (6.3)' 

Proof.  It  is  trivial  to  see  that  (6.l)  holds.  We  show  that  (2.2)  is 
satisfied,  i.e.  p  is  a  descent  direction  for  EMP  ,  as  follows: 
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Vi%  -  ViT  lz  z]  T~z 

•p  ip  rp 

=  e  Zq„  -  e  -  zz"e  _ 
n+1  HZ  n-t-1  n+1 

T  T  T  T 

=  -  qz  LM-  qz  -  en+1  zz  _ 

The  first  term  is  zero  or  negative.  The  second  term  is  negative  since 

»■— tti  m 

otherwise  ZZ  e  =  ZZ  e  and  hence  AXT  =  AX  ,  where  XT  is 
n+l  n+l  jl»  L  Jj 

with  (\_).  deleted,  which  implies  (xT).  =  0  ,  a  contradiction. 

L  L  3  u  j 

Thus  ZTX  is  a  descent  direction.  The  proof  that  (6.2)  holds  follows 

<u 

that  of  Theorem  1,  Gill  and  Murray  (l9T9)>  We  have 


A(T1  Am  Arp 

ajZr£  -  a<j  [Z  z]  r~  -  [  0  a<j  z]  r~ 

AT  T 

=  -  a.zz  e  . 
j  n*-l 

Note  that  the  fact  that  A  has  full  rank  implies  that  a.Z  f  0  and  hence 

«T  /  /  .  T 

a.z  f  0  .  Multiplying  both  sides  of  (5-1)  hy  z  we  have 
J 


rp/v  m 

(Vj  **J-Z  Vl 


SO 

Note  that  there  is  no  guarantee  that  the  corresponding  range  space 

a# 

step  Yp~  to  the  modified  set  of  constraints  satisfies  either  (2.2  )  or 
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(6.2).  This  step  is  defined  by 

A  Yp~  =  -c 

~  A  A  <w 

where  c  is  c  with  c.  deleted  and  the  orthogonal  columns  of  Y 

tJ 

span  the  range  of  A  .  However,  since  Zr~  satisfies  (2.2)  and  (6.2) 
with  strict  inequality,  so  does  Yyp~  h  Zr~  for  small  enough  Y  . 

Although  Zr~  is  guaranteed  to  satisfy  the  required  properties 
(6.l),  (2.2)  and  (6.2),  the  Newton  step  Zq~  may  be  preferable  if  it 
satisfies  (6.2).  In  fact,  the  step  Zp~  ,  defined  by 

Z  wZp~  =  -  Zi(en+1  +  WYp~)  , 

may  be  preferable  to  either,  but  this  is  not  guaranteed  to  even  be  a 

descent  direction.  We  recommend  computing  Zq~  and  performing  the 

appropriate  inner  product  to  check  whether  it  is  a  feasible  descent 

direction,  falling  back  on  yTPy  +  Zr~  if  it  is  not,  and  substituting 

Zq~  for  Zr~  if  it  is.  This  may  be  done  even  if  >  0  ,  which  may 

T 

be  the  case  if  we  are  dieting  a  constraint  when  en+l  ^Py  >  ®  > 

since  Theorem  ^  does  not  hold  with  ^  substituted  for  (even  if  W 

is  assumed  to  be  positive  definite ) .  Although  when  exact  Hessians  are 
available,  it  is  normally  inadvisable  to  delete  a  constraint  and  take  a 
Newton  step  when  (p^K  >  0  ,  in  this  case  it  is  desirable  to  delete  a 
constraint  and  the  only  question  is  what  step  to  take. 

In  all  of  the  above, when  a  constraint  is  deleted  from  the  active  set 
it  is  not  necessary  to  recompute  the  factorizations  of  A  and  Z  WZ  from 
scratch.  They  can  be  obtained  by  updating  the  factorizations  already 
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available,  as  described  in  Gill,  Golub,  Murray  and  Saunders  (l974) 
and  Gill  and  Murray  (1974). 


Note  that  we  have  been  able  to  always  obtain  a  satisfactory  choice 
for  p  by  deleting  only  one  constraint  with  a  negative  multiplier 
estimate . 


2 

It  may  be  that  the  Hessians  fv  f^}  are  unavailable  either  because 
they  are  too  expensive  to  evaluate  ox-  too  difficult  to  determine. 

Here  we  outline  the  various  alternative  ways  to  approximate  W  . 

Recall  from  Section  1.1  that 


so  that  when  analytical  Hessians  are  used  a  matrix  of  order  n  is  stored. 
The  two  basic  alternatives  are  using  a  finite  difference  approximation  or 
a  quasi-Newton  approximation. 

A  finite  difference  approximation  to  W  ,  unlike  a  quasi-Newton 
approximation,  requires  extra  gradient  evaluations  to  be  done  at  each 
iteration.  However,  it  is  important  to  note  that  extra  gradient  evaluations 
of  only  the  t  active  functions  are  required,  where  t  may  be  significantly 
less  than  m  ,  the  number  of  functions  which  must  be  evaluated  at  each 


iteration.  Furthermore,  since  the  matrix  W  is  not  explicitly  required 
to  compute  p  we  can  form  a  direct  finite  difference  approximation  to 
WZ  by  differencing  the  gradient  of  the  Lagrangian  function  along  the 
columns  of  Z  .  It  is  also  necessary  to  difference  the  gradients  along 


Ypy  to  obtain  an  approximation  to  WYpY  (for  computing  p^  and 
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> 


Thus  the  t  active  gradients  must  each  he  evaluated  only  n  -  t  +  2 
times,  which  may  be  considerably  less  than  the  n  +  1  required  to 
approximate  W  . 

When  a  quasi-Newton  method  is  used,  two  approaches  are  possible. 

One  is  to  maintain  an  approximation  B  to  the  full  matrix  W  .  The 

problem  with  this  approach  is  that  W  may  have  negative  eigenvalues 

at  the  solution  and  so  a  hereditary  positive  definite  update  may  not  be 

appropriate.  Nonetheless,  Powell  (1977)  has  shown  that  it  is  possible  to 

maintain  a  positive  definite  approximation  and  still  obtain  superlinear 

local  convergence.  In  the  case  of  linearly  constrained  optimization.  Gill 

and  Murray  (1974)  have  suggested  maintaining  a  quasi-Newton  approximation 

-T — 

to  the  projected  Hessian  Z  WZ  ,  a  strategy  with  two  major  advantages: 
the  matrix  to  be  stored  has  lower  dimension,  and  because  the  projected 
Hessian  is  positive  definite  at  the  solution  a  positive  definite  update 
is  appropriate.  Murray  and  Wright  (l978)  have  suggested  and  numerically 
investigated  a  number  of  variants  of  a  similar  approach  for  problem  NCP. 

An  additional  difficulty  with  nonlinear  constraints  is  that  the  matrix  Z 
is  changing  at  every  iteration  as  well  as  the  matrix  W  .  We  note  that 
when  this  technique  is  used  the  vector  pz  (as  opposed  to  qz)  and  the 
second  order  multiplier  estimate  ^  are  not  made  available  .  However,  even  if 
the  full  matrix  is  approximated  these  vectors  are  likely  to  be  unreliable 
since  the  quasi-Newton  approximation  is  not  reliable  in  the  sense  that 
Bp  is  rarely  a  close  approximation  to  Wp  .  When  a  constraint  is 
deleted  from  the  active  set,  the  approximation  may  be  augmented  as 
follows : 
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The  resulting  direction  Zq£  (substituting  Z  BZ  for  Z  WZ)  is  then 
identical  to  the  direction  Zr~  described  earlier. 

The  choice  between  exact  Hessians,  finite  difference  of  gradients, 
and  a  quasi-Nevrton  method  will  depend  on  various  circumstances.  For  further 
details  for  both  finite  difference  strategies  and  quasi-Newton  strategies 
the  reader  is  referred  to  Murray  and  Wright  (1978),  where  the  same  topic 
is  discussed  with  regard  to  algorithms  for  solving  problem  NCP. 

One  of  the  prime  applications  of  the  minimax  algorithm  is  to  data 
fitting  problems  in  the  f  norm  (see  Section  ll).  These  problems 

GO 

typically  have  a  large  number  of  functions  f^  (observations),  but  a 
relatively  small  number  of  variables.  Furthermore  if  the  functions  are 
not  too  highly  nonlinear  the  number  of  active  constraints  at  the  solution 
may  be  close  to  n+1  (it  would  be  exactly  n+1  if  the  problem  were  linear). 
Thus  it  may  often  be  that  t  «  m  and  n+l-t  «  n,  exactly  the  situation 
where  the  finite  difference  scheme  is  most  efficient.  Since  the  finite 
difference  scheme  will  normally  produce  a  better  direction  of  search 
at  each  iteration  than  a  quasi-Newton  method  and  also  has  a  higher 
rate  of  convergence,  it  will  normally  take  significantly  fewer 
iterations  to  reach  the  solution.  Thus  the  additional  work  at 
each  iteration  may  result  in  a  substantial  saving  overall. 


1+0  • 


It  is  worth  noting  that  a  feasible  point  algorithm  for  problem 
NCP  (such  as  the  barrier  trajectory  method  described  in  Wright  (1976)) 
may  have  to  evaluate  all  m  functions  for  each  column  of  Z  when 
using  a  finite  difference  scheme.  The  reason  for  this  is  to  ascertain 
that  each  evaluation  point  is  feasible  and  hence  the  t  active  gradients 
are  well-defined.  This  consideration  does  not  arise  with  the  minimax 
problem. 

8,  Determining  the  Steplength. 

(k) 

In  the  previous  sections  we  have  shown  how,  given  x  ,  we  may 
obtain  a  satisfactory  direction  of  search  p  which  is  a  descent  direction 
for  the  minimax  function.  We  now  discuss  how  to  determine  a  steplength 
a  to  take  along  p  .  Although  a  simple  steplength  algorithm  may  be  all 
that  is  required  to  meet  convergence  criteria  for  the  overall  algorithm, 
from  the  point  of  view  of  efficiency  it  is  important  that  the  step 
achieve  as  large  a  reduction  in  the  value  of  the  function  as  possible, 

given  a  certain  limit  on  the  effort  to  be  expended.  Since  FM  is  not 
differentiable  everywhere  a  steplength  algorithm  intended  for  differen¬ 
tiable  functions  will  not  be  efficient.  We  therefore  use  the  steplength 
algorithm  which  is  described  at  some  length  in  Murray  and  Overton  (1979). 
This  algorithm  is  designed  specifically  for  the  minimax  problem  and 
related  problems.  It  includes  a  facility  for  varying  the  limit  on 
the  effort  to  be  expended,  producing  anything  from  an  algorithm  which 
normally  takes  only  a  single  function  evaluation  to  one  which  does 


an  exact  linear  search 


The  steplength  algorithm  requires  an  initial  guess  at  a  ,  say 


,  where  Fw  is  to  be  evaluated  first.  We  set  a.  to  either  one,  or 
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the  estimated  step  to  the  nearest  inactive  constraint  using  the  linear 

(k) 

approximations  at  x  ,  if  this  is  less  than  one.  Thus 

aQ  =  min  {l,o£}  (8.1) 

ci  .  ip 

where  o'  =  min  { - =r  |  a.p  <  0  and  i  not  in  active  set}. 

a^P 

It  is  possible  that  a^=  0  and  hence  the  algorithm  must  allow  for  this 
possibility.  In  this  case  the  appropriate  inactive  constraint  must  be 
added  to  the  active  set  and  the  iteration  is  repeated. 

9.  Flowchart  of  the  Algorithm. 

We  summarize  the  basic  iteration  of  the  algorithm  in  the  flowchart 
in  Figure  2.  For  simplicity  we  have  omitted  any  tolerances  from  the  flow¬ 
chart  though  clearly  in  practice  these  must  be  included.  The  parameters 
and  y2  are  optional.  For  the  results  presented  in  Section  14, 

Y-^  is  set  to  1  when  possible  but  y2  is  always  set  to  zero.  The 
latter  is  done  because  near  a  saddle  point  or  after  deleting  a  constraint 
a  poor  range  space  direction  can  swamp  a  good  null  space  direction  if 
Yg  >  0  .  The  notation  S  is  used  to  mean  either  W  or  a  finite- 
difference  or  quasi-Newton  approximation  to  W  . 
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10.  Selecting  the  Active  Set. 

The  success  of  the  algorithm  we  have  described  depends  on  being 
able  to  make  a  reasonable  choice  of  the  active  set  at  each  iteration. 

If  constraints  are  required  to  have  very  small  magnitude  to  be  included 
in  the  active  set,  then  the  iterates  will  follow  the  constraint  bounda¬ 
ries  very  closely  and  the  convergence  will  be  slow.  Conversely  if  too 
many  constraints  are  selected  as  active  the  directions  of  search  may  be 
poor.  In  this  section  we  describe  an  active  set  strategy  that  has  been 
most  successful  for  the  numerical  experiments  we  have  carried  out.  It 
is  likely,  however,  that  this  strategy  will  be  modified  in  the  future 
after  more  extensive  numerical  experience. 

Clearly  one  feature  required  of  the  active  set  strategy  is  that, 

* 

as  the  iterates  approach  x  ,  it  should  become  successively  more  difficult 
for  constraints  with  magnitudes  significantly  greater  than  zero  to  be 
included.  One  way  to  accomplish  this  is  to  have  the  strategy  depend  on 
a  parameter  which  is  reduced  as  the  solution  is  approached  or  if  any 
difficulties  arise.  We  found  that  reducing  a  parameter  in  this  way  was 
not  satisfactory  since  it  can  easily  happen  that  the  parameter  is  reduced 
too  much,  making  it  impossible  to  ever  include  all  the  correct  active 
constraints.  Instead  we  take  the  following  approach.  There  is  always  at 
least  one  constraint  active  with  value  identically  zero  so  the  first 
decision  is  whether  to  include  a  second  constraint.  If  this  is  done  the 
decision  of  whether  to  include  others  is  tied  to  the  magnitude  of  the 
second  constraint.  Thus  the  required  objective  will  be  achieved,  pro¬ 
vided  that  the  first  decision  is  made  correctly  so  that  any  second  active 
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constraint  approaches  zero  as  the  iterates  approach  the  solution. 

Let  us  order  the  constraints  so  that  0  =  c,  <  c„  <  ...<  c  , 

with  the  and  fv^}  correspondingly  ordered.  We  include  in  the 

active  set  all  constraints  with  magnitude  less  than  a  very  small  tolerance, 
any  Kq-  In  order  to  compare  the  larger  constraint  magnitudes  we  must 
scale  them  in  some  way.  For  problem  t  P  (see  Section  11 )  we  define  the 

CO 

scaled  values  by  c.  =  ci  ,  since  the  value  of  F  can  only  be  very 

1  % 

small  if  all  the  constraints  values  are  very  small.  For  problem  MMP 

the  value  of  F,.  could  be  zero  or  negative  so  we  define  the  scaled  values 
M 

-  C 

by  c  =  i  .  Scaling  the  values  is  necessary  since  two  functions 

l+lfll+tfg! 

with  values  999  and  1000  are  likely  to  both  be  active  if  one  is,  while 
this  is  not  true  of  functions  with  values  1  and  2  .  Notice  that  the 

existence  of  any  functions  in  problem  MMP  with  negative  values  much  less 
than  Fm  does  not  affect  the  definition  of  c^  ,  which  is  appropriate 
since  it  does  not  affect  the  solution. 

* 

If  only  one  constraint  is  active,  then  llv^Cx))!  =  0  .  Therefore 

the  decision  of  whether  to  include  a  second  active  constraint  cg  is  made 
as  follows.  If  ||vjJ|  is  small,  say  j|vj|  <  ,  then  c 2  is  included 

only  if  cg  <  KgllvJ]  ,  where  ^  >  1  ,  a  test  which  can  be  justified 
by  Taylor  expansions  around  the  solution.  If  [Jv^jj  >  ,  then  cg 

is  included  only  if  ,  where  0  <  <  1  . 

It  remains  to  test  the  other  constraint  values  against  cg  .  We 

K. 

include  ^  if  ^  ^  is  included  and  c^  <  and  c^  <  4  , 

i  3,1*.,...,  min(m,n+l)  ,  where  0  <  <  1  •  Thus,  for  example,  if 
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and 


=  0.5  ,  is  included  if  Cg  is  included,  cg  =  10 

_  _4 

c^  =  10  ,  but  not  if  Cg  =  0.1  and  c^  =  0.5*  Note  that  we  always 

have  Cg  <  1  . 

We  found  that  it  was  also  important  not  to  include  a  constraint 
in  the  active  set  if  it  was  included  but  subsequently  deleted  in  the 
previous  iteration.  It  was  also  helpful  to  include  a  constraint 
requiring  only  that  c^  <  if  the  previous  line  search  chose  a 
step  to  a  point  of  discontinuity  involving  the  corresponding  function. 

Sometimes  some  constraints  are  much  smaller  than  others  which 
should  also  be  in  the  active  set  (for  example  if  some  of  the  functions 
are  linear).  In  such  a  situation  it  is  not  appropriate  to  compare 
all  the  constraints  with  Cg  .  The  remedy  is  to  let  the  first  constraint 
with  value  significantly  greater  than  the  machine  precision  play  the 
role  of  c2  .  However,  then  v^  must  be  replaced  by  a  projected 

rp  /v 

gradient  Z  v^  in  the  tests,  and  the  factorization  of  A  and  the 
accumulation  of  the  transformations  which  form  Z  must  be  performed 
as  the  active  set  selection  proceeds.  We  have  not  yet  implemented 
this  modification. 

Our  current  active  set  strategy  is  the  above  with  hq  set  to 
the  square  root  of  the  machine  precision,  =  0.1  ,  Hg  =  25  , 

Kj  =  0.25  for  I^P  ,  Kj  =  0.1  for  MMP,  and  =  0-5  • 
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11.  The  Special  Case  of  the  Least  1^  Norm  Problem. 

In  this  section  we  consider  problem  I  P  ,  defined  in  Section  1. 

CD 

Problem  I  P  is  an  important  special  case  of  MMP.  Note  that  I^P  must 
have  a  solution,  whereas  MMP  may  not.  Clearly  it  is  preferable  to  treat 
IP  in  a  special  manner  rather  than  just  treat  |f. |  as  max  (-f.  ,f. )  . 

If  we  eliminate  the  possibility  that  is  zero  at  the  solution,  i.e. 

*11  the  {fjJ  are  zero  at  the  solution,  then  we  can  observe  that  only 
one  of  each  pair  of  constraints  x  f. (x)  >  0  ,  x  f. (x)  >  0  , 
can  be  active  at  the  solution.  Thus  defining  c^  =  sgn(f^ (x) )  for  any  x 
it  is  straightforward  to  handle  IP  by  using  the  algorithm  described 
for  MMP,  replacing  f.  by  everywhere.  The  only  places  where  it  is 

necessary  to  consider  the  inactive  constraints  fc|  =  xn+1+  o^f.^  are  in 
the  determination  of  the  initial  guess  at  the  steplength  aQ  ,  and  in 
the  steplength  algorithm  itself.  Thus  must  be  set  to 

o0=  min  fl,ar*,og} 

where  a q  is  defined  by  (8.l)  and  «q  is  given  by 

c 1 

a'o  =  min  f - “  |  7c£Tp  <0,  i  =  1,2,..., m}  . 

7c^  p 

The  changes  which  must  be  made  to  the  steplength  algorithm  are  indicated 
in  Murray  and  Overton  (1979). 

If  is  zero  at  the  solution  there  is  still  no  difficulty  with  this 
approach  provided  that  m  >  n+l  ,  since  then  of  the  2m  constraints  active 
for  the  equivalent  problem  corresponding  to  IP  ,  at  most  n+l  can  be 
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included  in  the  active  set  to  obtain  a  full  rank  active  constraint 

Jacobian.  Thus  including  in  the  active  set  at  most  one  constraint 

of  each  pair  causes  no  difficulty.  Such  a  situation  is  a  highly 

degenerate  one  but  the  point  is  that  if  the  situation  does  arise  the 

algorithm  will  take  care  of  it  efficiently.  The  usual  source  of 

I  P  is  data  fitting  problems  so  we  expect  almost  always  to  have 
00 

m  >  n+1  .  However,  if  it  does  happen  that  we  wish  to  solve  i  P  with 
—  00 

m  <  n  then  the  above  technique  may  be  very  inefficient  since  both 
constraints  in  a  pair  of  constraints  active  at  the  solution  cannot  be 
put  into  the  active  set.  Instead  of  making  a  complicated  modification 
to  take  care  of  this  unlikely  possibility,  we  recommend  writing  the 
l roblem  explicitly  in  the  form  MMP  and  solving  this  directly. 

It  is  straightforward  to  generalize  the  above  to  an  algorithm  which 
can  be  used  to  minimize  a  more  general  function 

Fgm(x)=  max  fmax  |fi(x)|,  max  f^Cx)}, 

1  <  i  <  m^,  m^+  1  <  i  <  m 

assuming  m^  >  n+1  if  m^  4  0  .  Since  coping  with  the  general  case 

introduces  very  little  extra  overhead  our  implementation  of  the  algorithm 

handles  this  wider  class  of  functions.  In  this  way  one  algorithm  takes 

care  of  both  MMP  (m,  =  o)  and  l  P 

1  <*> 

11.1  The  Haar  Condition. 

We  comment  here  on  the  meaning  of  the  Haar  condition  since  this  is 
usually  discussed  in  the  context  of  I  approximation.  Let  us  first 
consider  the  case  that  the  f^  are  linear.  The  Haar  condition  is  said 


(m^  =  m). 


49 


to  hold  for  these  functions  if  every  n  x  n  submatrix  of  V  is  non¬ 


singular,  where  V  is  the  n  x  m  matrix  whose  columns  are  the  gradients 

of  the  ff. }  .  Consider  problem  EMP  which  is  now  a  linear  programming 
1  * 

problem.  A  necessary  condition  for  x  to  be  a  solution  to  EMP  is  that 
V  \  =  0  and  \  f  0  (since  e  \  =  l)  .  Thus  the  requirement  that  the 
Haar  condition  holds  implies  that  there  be  at  least  n+1  active  con¬ 
straints  at  the  solution  with  none  of  the  multipliers  equal  to  zero, 
and  hence  that  the  solution  is  unique.  Most  algorithms  which  solve  the 
linear  mini max  problem  do  so  by  solving  a  linear  program  related  to  ELP. 

Thus  whether  or  not  the  Haar  condition  holds  is  quite 

irrelevant  to  the  difficulty  of  solving  t  P  ,  since  zero  multipliers 

cause  no  real  difficulty  in  solving  linear  programs.  Degeneracy,  which 

A 

occurs  when  the  matrix  A  is  rank  deficient  and  can  in  theory  cause 

problems  in  solving  a  linear  program,  can  occur  whether  or  not  the  Haar 

condition  holds.  The  significance  of  the  Haar  condition  is  that  if  it 

does  not  hold  the  solution  may  not  be  unique,  and  hence  one  may  be 

interested  in  a  "strict"  solution  to  the  data  approximation  problem  which 

is  unique,  i.e.  that  solution  of  I  P  which  minimizes 

00 

* 

max  f|f^(x)|  f^  inactive  at  x} 

(see  Rice  (1969)  and  Brannigan  (1978)). 

The  Haar  condition  is  much  stronger  than  necessary  to  ensure  unique¬ 
ness.  A  slightly  more  reasonable  condition  is  that  there  be  n+1  constraints 
with  nonzero  multipliers  active  at  the  solution  of  i^P  (see  Jittomtrum 
and  Csborne  (1979,  p.5)  for  an  example  showing  that  this  condition  is 
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still  stronger  than  necessary  to  ensure  uniqueness).  This  condition 

cannot  he  checked  without  solving  ta P  but  solving  I^P  is  actually 

much  easier  to  do  than  checking  whether  the  Haar  conditon  holds. 

Let  us  now  consider  the  nonlinear  case.  The  Haar  condition  is  then 

said  to  hold  at  a  point  x  if  every  subset  of  gradients  of  functions 

with  zero  constraint  values  at  x  is  linearly  independent.  As  in  the 

* 

linear  case  we  are  really  only  concerned  with  the  situation  at  x  .  Thus 

* 

* 

the  Haar  condition  holds  at  x  if  every  n  X  n  submatrix  of  V  has  full 

rank.  Again  it  follows  that  if  the  Haar  conditon  holds  there  must  be 

* 

n+1  constraints  with  nonzero  multipliers  active  at  x  .  This  condition 

is  however  a  much  more  unreasonable  one  than  in  the  linear  case.  There 

must  always  exist  a  solution  to  the  linear  problem  where  n+1  constraints 

are  active,  but  the  nonlinear  problem  can  have  a  unique  solution  with 

anything  from  1  to  n+1  constraints  active.  If  the  Haar  condition  does 
* 

* 

hold  at  x  then  Z  is  null  and  the  problem  can  be  adequately  solved  by 

a  method  using  only  first-order  information.  Thus  our  algorithm  has  been 

designed  with  the  assumption  that  the  Haar  condition  often  does  not  hold. 

When  l  P  arises  from  data  fitting  problems  it  may  sometimes  be  the  case 
00  * 

that  the  Haar  condition  can  be  expected  to  hold  at  x  .  However,  the 
only  way  to  ascertain  whether  the  Haar  condition  does  indeed  hold  is  to 
solve  the  problem.  Since  we  cannot  be  sure  at  the  outset  it  is  clearly 
unsatisfactory  to  be  using  an  algorithm  which  uses  first  order- information 
only  and  hence  may  indicate  that  the  Haar  condition  does  not  hold  by 
extremely  poor  performance.  For  many  data  fitting  problems  the  number  of 
active  constraints  at  the  solution  may  be  close  to  n+1  ,  and  as  pointed 
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out  in  Section  7  this  means  using  a  finite  difference  approximation  to 
T 

Z  WZ,  which  has  order  n+l-t  ,  may  be  quite  inexpensive. 

Cromme  (1972)  discusses  in  a  broader  setting  a  weaker  condition  than 
* 

the  Haar  condition  at  x  called  strong  uniqueness  which  still  ensures 

that  a  particular  algorithm  using  only  first-order  information  converges 

quadrat ically.  Strong  uniqueness  implies  that  n+1  constraints  are 
* 

active  at  x  but  is  weaker  than  the  Haar  condition  since  it  permits  active 

constraints  with  zero  multipliers.  Jittorntrum  and  Osborne  (1979)  discuss 

a  still  weaker  condition  which  arises  from  examining  the  curvature  in  the 

null  space  of  the  active  constraint  Jacobian  with  gradients  corresponding 

to  zero  multipliers  deleted.  This  weaker  condition  also  ensures  that  a 

first-order  method  has  quadratic  convergence.  The  sufficient  conditions 

for  a  minimum  given  in  Section  1.1  are  much  weaker  than  any  of  these 

* 

conditions  since  they  permit  t(x)  to  be  less  than  n+1  ,  in  which  case 
an  efficient  algorithm  must  approximate  Az  .  These  conditions  could 
themselves  be  weakened  if  curvature  corresponding  to  zero  multipliers  were 
examined. 
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12.  Relationships  to  Other  Algorithms 


A  number  of  other  algorithms  have  been  proposed  to  solve  the  nonlinear 
minimax  problem.  It  was  not  until  recently,  however,  that  special  algorithms 
for  MMP  which  make  use  of  second-order  information  appeared.  Han  ( 1977b,  1978a, 
1978b)  suggests  methods  which  solve  a  sequence  of  quadratic  programming  prob¬ 
lems  using  a  quasi-Newton  approximation  to  W  .  These  will  discussed  further 
shortly.  Watson  (1979)  and  Hald  and  Madsen  (1978)  proposed  two-stage  methods 
which  begin  by  using  the  first-order  methods  of  Anderson  and  Osborne  (1977) 
and  Madsen  (1975),  respectively,  and  switch  to  solving  a  system  of  nonlinear 
equations  using  the  second-order  information  in  W  when  it  is  thought  that 
the  active  set  has  been  identified.  The  system  is  of  order  n+l+t  ,  i.e., 
the  multipliers  and  variables  are  all  obtained  at  once.  Recall  that  for 
problem  ,  t  is  often  close  to  n+1  so  the  systems  that  Watson  and 

Hald  and  Madsen  solve  may  be  much  larger  than  the  ones  we  solve.  The  direc¬ 
tion  of  search  obtained  is  not  necessarily  a  descent  direction  for  the  mini- 
max  function  but  only  for  the  residual  of  the  nonlinear  system.  A  method 
related  to  the  second  stage  of  these  methods  was  given  by  Hettich  (1976). 

Conn  (1979)  presents  a  method  which  is  derived  from  the  point  of  view  of 
a  nondifferentiable  penalty  function.  It  is  related  to  the  algorithm  of 
Charalambous  and  Conn  (1978)  but  uses  second-order  information.  We  discuss 
this  method  further  below.  Other  methods  which  use  second-order  information 
and  are  related  to  Han's  method  are  disucssed  by  Charalambous  and  Moharram 
(1978,1979)  and  Wierzbicki  (1978). 
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Our  algorithm  is  most  closely  related  to  the  methods  of  Han  (l977b, 
1978a)  and  Conn  (1979),  so  we  discuss  these  further  here.  The  primary 
difference  between  our  method  and  Han's  method  is  that  we  attempt  to  iden¬ 
tify  the  active  set  at  each  iteration  and  then  solve  an  equality- con¬ 
strained  quadratic  program  (EQP) ,  modifying  the  resulting  direction  of 
search  and  the  active  set  if  necessary,  while  Han  solves  an  inequality- 
constrained  quadratic  program  (iQp),  thus  implicitly  selecting  the  active 
set  associated  with  the  solution  of  IQP.  The  IQP  has  the  form: 

IT  T 

min  g  p  Wp  +  eQ+1  p 

T 

subject  to  A  p  >  -c 

where  c  and  A  are  the  vector  and  matrix  of  all  the  constraints  and 
their  gradients. 

The  dichotomy  of  whether  to  solve  EQP  or  IQP  occurs  at  all  levels 
of  constrained  optimization.  Murray  (l969a)  considered  solving  the  IQP 
associated  with  his  algorithm  for  NCP  but  found  an  EQP  strategy  more 
successful.  He  also  considered  a  strategy  of  partially  solving  IQP. 

The  same  question  arises  in  linearly  constrained  optimization.  Brayton 
euid  Cullum  (l977)  report  some  results  which  indicate  that  for  the  case 
of  minimization  subject  to  bounds  on  the  variables  (the  simplest  possible 
constrained  optimization  problem),  solving  IQP  is  not  in  general  more 
efficient  than  solving  EQP. 
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The  motivation  for  solving  IQP  is  that  it  makes  the  fullest  use 
(k) 

of  the  information  at  x  .  Furthermore  it  simplifies  the  description 
of  the  algorithm  and  for  problem  MMP  makes  it  straightforward  to  get 
a  descent  direction  since  positive  constraints  are  not  selected  for  an 
active  set  except  in  the  process  of  solving  IQP.  Clearly  one  disadvan¬ 
tage  of  solving  IQP  is  that  it  is  more  work  than  solving  EQP.  If  m  »  n 
and  the  function  evaluations  are  not  too  expensive  then  an  algorithm 
which  solves  IQP  may  be  extremely  inefficient  compared  to  one  which  solves 
EQP.  However  this  is  not  the  main  objection  to  solving  IQP.  The  main 
motivation  for  solving  EQP  rather  than  IQP  is  that  the  linear  approximations 

to  the  constraints  (for  NCP  and  MMP)  and  the  quadratic  approximation  to  the 

(k) 

Lagrangian  function  are  unreliable  away  from  the  current  point  x 

The  process  of  solving  IQP  involves  successively  making  decisions  about 

which  constraints  to  include  in  the  active  set  at  points  which  may  be  quite 


(k)  (k) 

far  from  x  ,  based  on  the  approximations  at  x 


Thus  the  final  point 


at  which  this  decision  is  made,  the  solution  of  IQP,  may  be  the  result  of 

(k) 

choosing  an  active  set  which  has  no  meaning  whatsoever.  If  x  is  so 

close  to  x  that  the  approximations  are  satisfactory  then  IQP  may  still 

have  no  advantage  over  EQP  since  they  may  well  have  the  same  solution. 

Most  of  the  differences  between  Han’s  method  and  ours  result  from 

the  difference  between  IQP  and  EQP  (he  uses  the  multipliers  from  the 

old  IQP  and  requires  that  the  full  Hessian  ft  is  positive  definite, 

T 

while  we  use  Ac  to  define  W  and  require  only  that  Z  WZ  be 
positive  definite).  He  discusses  only  quasi-Newton  methods  while  we 
consider  also  a  finite  difference  strategy.  Finally,  although  his 


line  search  is  used  to  obtain  a  reduction  in  F^  it  is  not  designed 

specially  for  nondifferentiable  functions  as  ours  is. 
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Han  ( 1978b)  presents  another  algorithm  for  MMP  which  is  related  to 
the  one  discussed  above.  It  is  quite  different  however  in  that  the  line 
search  takes  place  in  the  (n+l)  dimensional  space.  The  line  search  obtains 
a  reduction  in  the  function 

m 

6(ff)  =  Vl+  <*Vl+Z  max(fi(i  +  “P  }  -  (Vl+  aP n+l),o) 

i=l 

instead  of  FM(x  +  op  ).  The  motivation  given  for  this  is  that  0  takes 
into  account  some  inactive  functions  while  F„  gives  bias  completely  to 
the  active  functions.  However,  our  view  is  that  inactive  functions  should 
be  considered  in  the  line  search  only  if  they  are  likely  to  become  active 
along  the  line,  and  this  is  exactly  what  our  special  line  search  to  reduce 
the  minimax  function  does. 

The  problem  with  reducing  0  is  that  it  relies  too  much  on  the 
value  of  Pn+1  which  gives  only  a  linear  approximation  to  the  active 
functions  at  x^  .  It  is  easy  to  construct  examples  where  minimizing 
0  along  the  line  results  in  a  much  smaller  reduction  of  FM  than  is 
possible.  We  feel  that  using  0  instead  of  FM  in  the  line  search  is 
discarding  one  of  the  most  useful  tools  available  to  solve  MMP. 

The  analysis  in  Han  (l978b)  is  concerned  with  the  fact  that  the 
quadratic  form  of  IQP  is  not  positive  definite  in  R1^1  .  Our  view  how¬ 
ever  is  that  the  only  matrix  whose  positive  definiteness  should  be  a 
concern  is  the  projected  Hessian,  and  this  is  the  same  in  Rn  and  Rn  1 
since  2Tff2  =  ZTWZ  .  Recall  that  we  expect  the  dimension  of  this  matrix 


to  be  much  smaller  than  that  of  W  . 
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We  now  turn  our  attention  to  the  algorithm  of  Conn  (l979) ,  which  is 
more  closely  related  to  ours  in  a  number  of  ways.  Like  us,  Conn  attempts 
to  identify  the  active  set  and  solves  a  related  EQP  at  every  iteration. 
However,  unlike  ours,  his  search  direction  does  not  include  a  component 
in  the  space  spanned  by  Y  unless  there  is  reason  to  believe  that  x 
is  near  a  stationary  point.  Furthermore  he  uses  the  Hessian  of  the 
Lagrangian  function  to  give  the  quadratic  form  of  the  QP  only  if  x^  is 
thought  to  be  near  a  stationary  point.  Otherwise  he  uses  instead  the 
Hessians  of  one  of  the  active  functions  at  each  iteration.  His  reason  for 
this  is  that  the  Lagrange  multipliers  may  be  highly  inaccurate  away  from 
a  stationary  point.  Although  this  is  certainly  true,  our  view  is  that 
using  the  Hessian  of  only  one  function  is  equivalent  to  using  a  multiplier 
estimate  with  only  one  component  equal  to  one  and  the  rest  zero,  and  that 

this  is  no  less  arbitrary  than  using  any  other  vector  of  nonnegative 
components  which  sum  to  one  to  define  W  .  As  we  explained  at  the  end 
of  Section  4,  our  algorithm  always  uses  such  a  vector  to  define  W  . 

Our  approach  eliminates  any  need  to  decide  when  to  switch  from  one  strategy 
to  another,  something  which  it  is  difficult  to  do  since  it  is  hard  to 
tell  how  accurate  multiplier  estimates  are.  Furthermore  using  different 
Hessians  at  different  iterations  makes  a  quasi -Newton  approach  difficult. 

There  are  many  other  differences  between  Conn's  algorithm  and  ours 
which  follow  because  of  the  fact  that  his  approach  is  via  a  nondifferent- 
iable  penalty  function  while  ours  is  via  a  Langrangian  function.  For 

A 

example,  he  does  not  factorize  the  matrix  A  as  we  do,  but  instead  factor- 

^  A 

izes  a  matrix  M  which  is  -V  less  one  column  v^  corresponding  to  the 
one  function  whose  Hessian  is  to  be  computed.  This  matrix  factorization 
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is  then  updated  to  give  a  factorization  of  the  matrix  resulting  from  adding 

A 

v ,  to  each  column  of  M  .  It  can  be  shown  that  this  approach  restricts 

p  to  the  same  null  space  as  our  algorithm.  Multiplier  estimates  can  also 

be  computed  by  this  approach  but  they  will  not  be  the  same  as  either  XL 

or  \c  since  they  give  bias  to  function  j  .  Conn  shows  how  to  take 

advantage  of  any  explicitly  linear  functions  in  his  algorithm. 

In  a  way  our  algorithm  treads  the  middle  ground  between  Han's  method 

(k) 

and  Conn's  method.  Han  relies  on  the  approximation  at  x  so  completely 

that  he  solves  the  IQP.  Conn  distrusts  the  multiplier  estimates  and  does 

(k) 

not  use  them  unless  x  is  near  a  stationary  point.  We  believe  that 
some  multiplier  estimates  are  better  than  no  estimates  at  all,  but  we  solve 
the  QP  which  relies  as  little  on  the  approximations  as  possible. 
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13*  Convergence  Properties. 

In  the  limit  our  method  becomes  a  projected  Lagrangian  method  for 
NCP  applied  to  the  special  case  MMP  and  so  we  can  make  use  of  the  known 
asymptotic  local  convergence  results  for  these  methods.  Robinson  (1974) 
showed  that  the  method  of  Wilson  (1963)  has  a  quadratic  rate  of  local 
convergence  if  analytical  Hessians  of  the  objective  function  and  con¬ 
straints  are  used,  provided  that  the  functions  are  sufficiently  smooth 
and  there  are  no  zero  multipliers  at  the  solution.  This  last  condition 
also  ensures  that  ultimately  solving  the  IQP  and  the  EQP  are  the  same, 
so  the  only  difference  between  our  method  and  Wilson's  in  the  limit  is 
the  fact  that  we  use  the  first-order  multiplier  estimates  at  the  new 
point  (instead  of  the  multipliers  of  the  old  QP)  to  define  W  .  It  is 
shown  by  Fletcher  (1974)  that  using  the  first-order  multiplier  estimates 
does  not  inhibit  the  quadratic  convergence  of  a  projected  Lagrangian 
method. 

Using  finite-difference  approximations  to  the  Hessian  or  projected 
Hessian  is  well  known  to  have  essentially  the  same  final  rate  of  con¬ 
vergence  (to  machine  precision)  as  using  analytical  Hessians.  A  super- 
linear  rate  of  convergence  for  a  projected  Lagrangian  method  using  a  par¬ 
ticular  quasi-Newton  approximation  to  the  full  matrix  W  has  been  shown 
by  Powell  (1977). 

lU.  Computational  Results. 

In  this  section  we  illustrate  the  usefulness  of  the  algorithm  by 
presenting  some  numerical  results  for  the  case  where  finite-difference 
approximations  to  the  Hessian  of  the  Lagrangian  function  are  used. 


Six  problems  are  selected  from  the  literature.  We  solve  the  first 

four  as  problems  of  the  type  /  P  and  the  last  two  as  problems  of 

00 

the  type  MMP.  The  problems  and  their  solutions  are  as  follows: 


Problem  1:  Bard  (l97o). 


>  j  *  1>2, ... >15 


where  =  j  ,  v^  =  16-j  ,  w^  =  min(uj>v^)  and  the  vector  y 
is  given  in  Table  1. 

Starting  Point:  xQ  =  (l,l,l)T  . 

Problem  2:  Kowallk  and  Osborne  (l968). 

x- (u  2+  x_u, ) 


f  (x)  =  y  .  - ZJL 

Uj  +  X3V  *4 


>  j  *  1,2, « « • >11 


where  vectors  y  and  u  are  given  in  Table  2. 
Starting  Point:  xQ  -  (o. 25,0. 39,0. 415,0. 39)T 


TABLE  1 


TABLE  2 


i 

yi 

i 

yi 

ui 

1 

0.14 

1 

0.1957 

4.0000 

2 

0.18 

2 

0.1947 

2.0000 

3 

0.22 

3 

0.1735 

1.0000 

4 

0.25 

4 

0.1600 

0.5000 

5 

0.29 

5 

0.0844 

0.2500 

6 

0.32 

6 

0.0627 

o.l67o 

7 

0.35 

7 

0.0456 

0.1250 

8 

0.39 

8 

0.0342 

0.1000 

9 

0.37 

9 

0.0323 

0.0833 

10 

0.58 

10 

0.0235 

0.0714 

11 

0.73 

11 

0.0246 

0.0625 

12 

0.96 

13 

1.34 

lb 

2.10 

15 

4.39 

i 


Problem  3:  Madsen  (1575). 

fl(x)  -  *1  +  x^+  xxx2 
fg(x)  *  sin  x^ 
f^(x)  *  COS  Xg  . 

Starting  Point:  xQ  =  (3,l)T  . 

Problem  4:  El-Attar  et  al.  (1979)  ft  2. 


fx(x)  =  Xx2+  Xg24-  X^2-  1 

=  Xj  +  Xg-  X 

fg(*)  “  *12+  *g2+  (*3-  a)2 

t5  (x)  =  2x35+  6xg‘ 

f?(x)  -  X^  Xg>  x?-  1 

f6(x)  =  x*-  9x3 

Starting  Point:  xQ  =  (l,l,l)T  . 

Problem  5:  Charalambous  and  Handler  (1976)  ft  1 

fx(x)  *  x*+  x^ 
fg(x)  =  (2-x^)2+  (2-Xg)2 
f?(x)  =  2  exp(-x^+  Xg)  . 
Starting  Pdint:  x0  =  (l,-0.l)T  . 


.+  1 

+•  2(5x?-  Xj+  1 )2 
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Problem  6:  Rosen  and  Suzuki  (l 965). 


fx(x)  =  xx2+  x22+  2x32+  x^2-  5Xj-  5x2-  21x?*  7x^  . 
fg(x)  =  f^(x)  -  10 (-  X^2-  Xg2-  X^2-  X^2-  X^  +  Xg-  x^+  x^+  8) 
f^(x)  =  f^(x)  -  10 (-  Xj2-  2Xg2-  x^2-  2x1j2+  Xj+  x^  *■  10) 
f^(x)  =  f^(x)  -  10 (-  SXj2-  Xg2-  x^2-  2x^v  Xg+  x^+  5)  • 

Starting  Point:  xQ  =  (0,0,0,0)T  . 

(This  is  a  problem  originally  of  the  type  NCP  transformed  to  type  MMP 
by  the  introduction  of  the  penalty  parameter  10,  which  is  always  possible 
if  the  parameter  is  large  enough,  as  several  authors  have  pointed  out). 

Solutions  found; 

Problem  1,  (l^P); 

F  (x)  =  0.77601  with  x  =  (0.17757,  -  0.94295,  5.30796)T  . 

00 

(This  is  a  different  local  minimum  from  that  found  by  Watson  (l979)). 
Problem  2, (l  P): 

""  "  op— * — 

F  (x)  =  0.0080844  with  X  =  (o. 18463,  0.10521,  0.01196,  0.11179) 
00 


Problem  3,  (i  P) : 


F  (x)  =  0.61643  with  x  =  (-  0.45330,  0.9o659)T  . 

00 


F  (x)  -  3.59972  with  x  «  (0.3282 6,  0.00000,  0.13132)T 


Problem  5,  (MMP): 

Fm(x)  «  1.95222  with  x  -  (1.13904,  0.89956)T 
Problem  6,  (MMP): 

Fm(x)  -  -  44.0000  with  x  -  (0. 00000,  1.00000,  2.00000,-  1. 00000 ). 

The  results  are  summarized  in  Table  3-  The  termination  conditions 
requires  were  that  «c*2  <  10~6,  5zTen+1«2  <  ]0-6  ,  ZTWZ  numerically 
positive  semi-definite  and  Ac  >  0  .  The  line  search  accuracy  parameter  n 
was  set  to  0.9  (see  Murray  and  Overton  (1979)  for  the  definition  of  this  parameter). 
Several  other  choices  of  n  were  tried,  but  n  =  0.9  was  the  most 
efficient,  indicating  as  expected  that  a  slack  line  search  is  desirable 
at  least  on  these  problems.  The  machine  used  was  an  IBM  370/168  in 
double  precision,  i.e.,  with  16  decimal  digits  of  accuracy.  The  column 
headed  NI  reports  the  number  of  iterations  required,  which  is  also  the 
number  of  times  the  Hessian  was  approximated.  The  column  headed  NF 
gives  the  number  of  function  evaluations  (not  including  gradient  evaluations 
for  the  Hessian  approximation). 
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TABLE  3 


Problem 

n 

m 

n+1  -t 

NI 

NF 

1  (Bard) 

3 

15 

1 

11 

11 

2  (Kcwalik  and  Osborne) 

4 

11 

0 

11 

14 

3  (Madsen) 

2 

3 

1 

13 

19 

4  (El-Attar  et  al,#2) 

3 

6 

2 

7 

8 

5  (Charalambous  and  Bandler,#l) 

2 

3 

1 

6 

6 

6  (Rosen  and  Suzuki) 

4 

4 

2 

7 

10 

The  results  demonstrate  that  at 

least 

on 

a  limited 

set 

of  test 

problems  the  algorithm  fulfills  some  of  its  promise.  Final  quadratic 
convergence  was  observed  in  all  cases.  The  algorithm  has  been  tested 
on  a  wider  set  of  problems  and  results  obtained  for  a  variety  of  choices 
of  the  optional  parameters,  it  was  clear  from  these  more  extensive  results 
that  more  work  needs  to  be  done  in  developing  the  active  set  selection 
strategy.  These  results  must  therefore  be  regarded  as  preliminary. 

15*  Concluding  Remarks. 

We  conclude  with  emphasizing  the  importance  of  solving  MMP  by  a 
special  algorithm  such  as  the  one  presented  here  and  not  just  applying 
an  algorithm  for  the  general  nonlinear  programming  problem  NCP  to  the 
equivalent  problem  EMP.  The  primary  simplification  of  the  minimax  problem 
over  the  nonlinear  programming  problem  is  that  a  natural  merit  function 
is  available  to  measure  progress  towards  the  solution.  To  put  this 
another  way,  it  is  always  possible  to  reduce  in  the  line  search  and 
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obtain  a  new  feasible  point  for  problem  EMP.  Several  results  in  this 
chapter  followed  from  the  availability  of  the  natural  merit  function. 

In  particular,  consider  Theorem  4  (Section  5.l).  This  shows  that  if 
YPy  ,  the  component  of  the  search  direction  in  the  range  space  of  the 
active  constraint  Jacobian,  is  an  uphill  direction  with  respect  to  the 
minimax  function,  then  it  is  known  that  too  many  constraints  are  in  the 
active  set  and  a  constraint  with  a  positive  value  and  a  negative  multi¬ 
plier  estimate  can  be  deleted  to  obtain  a  descent  direction.  There  is 
no  analogue  of  Theorem  4  for  the  nonlinear  programming  problem  NCP, 

A 

because  c  may  have  negative  components.  If  the  vector  YP^  is  uphill 
with  respect  to  an  artificial  merit  function  such  as  a  penalty  function, 
then  it  may  be  because  there  are  too  many  constraints  in  the  active  set, 
or  it  may  be  because  the  penalty  parameter  is  not  large  enough. 

There  are  other  aspects  which  make  it  clear  that  solving  MMP  in  a 
special  way  is  advantageous.  Since  the  first-order  constraint  qualifi¬ 
cations  are  always  satisfied  (see  Section  1.2)  there  is  no  need  to  be 
concerned  over  the  existence  of  Lagrange  multipliers  when  the  active 
constraint  Jacobian  becomes  rank  deficient.  Also,  as  was  pointed  out 
in  Section  4,  problem  MMP  is  in  some  sense  naturally  scaled  and  the 
first-order  Lagrange  multiplier  estimates  can  take  advantage  of  this 
fact. 

It  should  be  clear  by  now  how  our  algorithm  is  related  to  the 
projected  Lagrangian  algorithms  which  have  been  proposed  to  solve  NCP. 
Wilson  (1963),  Robinson  (l974),  Han  ( 1977a)  and  Powell  (1977)  all  solve 
successive  inequality  constrained  QP's,  so  in  that  sense  they  are  more 
closely  related  to  the  method  of  Han  (1977b,  1978a)  than  to  our  method. 
Murray  (1969a,  1969b),  Wright  (1976)  and  Murray  and  Wright  (1978)  solve 


successive  equality-constrained  QP's.  However,  their  methods  differ 
from  the  others  and  from  ours  in  the  sense  that  they  do  not  attempt  to 
step  to  the  active  constraint  boundaries  at  every  step  but  control  how 
far  outside  or  inside  the  feasible  region  the  iterates  stay  by  means  of 
penalty  and  barrier  parameters.  This  type  of  approach  has  proved  to  be 

very  successful  for  solving  NCP  because  it  balances  the  reduction  of 
the  objective  function  with  the  reduction  of  the  constraint  violation 
in  a  satisfactory  way.  However,  this  approach  is  quite  unnecessary  for 
solving  MMP  since  it  is  always  trivial  to  obtain  a  feasible  point  for  EMP. 

To  put  it  another  way,  reducing  the  minimax  function  in  the  line  search 
always  results  in  a  step  towards  the  constraint  boundaries,  although  we 
do  not  usually  wish  to  step  exactly  to  the  boundaries  by  doing  an  exact 
line  search. 

Constrained  Problems. 

Linear  constraints  can  be  handled  by  the  algorithm  we  have  presented, 
since  t’->ey  can  be  incorporated  into  the  QP  at  each  iteration.  It  follows 
1  bove  remarks  however  that  nonlinear  constraints  cannot  be  handled 

by  ^gorithm  for  MMP  in  a  straightforward  way.  As  soon  as  nonlinear 

constraints  are  introduced  the  natural  merit  function  is  lost  and  the  problem 
takes  on  the  complexity  of  the  general  nonlinear  programming  problem  NCP. 

Of  course  nonlinear  constraints  can  still  be  handled  by  nonlinear  pro¬ 
gramming  methods,  but  it  is  important  to  recognize  the  increase  in  complexity. 
Clearly  the  best  approach  would  be  one  which  takes  advantage  of  the  minimax 
structure  and  introduces  an  artificial  merit  function  dealing  with  the 
genuine  nonlinear  constraints  and  not  with  those  of  EMP. 
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